linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] Support multiple pages allocation
@ 2013-07-03  8:34 Joonsoo Kim
  2013-07-03  8:34 ` [RFC PATCH 1/5] mm, page_alloc: support " Joonsoo Kim
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-03  8:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, David Rientjes, Glauber Costa, Johannes Weiner,
	KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Minchan Kim,
	Jiang Liu, linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

Hello.

This patchset introduces multiple pages allocation feature to buddy
allocator. Currently, there is no ability to allocate multiple pages
at once, so we should invoke single page allocation logic repeatedly.
This has some overheads like as overhead of function call with many
arguments and overhead for finding proper node and zone.

With this patchset, we can reduce these overheads.
Here goes some experimental result of allocation test.
I did the test on below setup.
CPU: 4 cpus, 3.00GHz.
RAM: 4 GB
Kernel: v3.10 vanilla

Each case of result is an average of 20 runs.

Time(us) : Improvement Percentage

Before			Patched	1 page		Patched	2 page		Patched	4 page
--------------------------------------------------------------------------------------
128KB	5.3	0	4.45	16.04%		3.25	38.68%		3.75	29.25%
256KB	13.15	0	10.15	22.81%		8.8	33.08%		8.5	35.36%
512KB	72.3	0	34.65	52.07%		82.65	-14.32%		25	65.42%
1024KB	114.9	0	112.95	1.70%		87.55	23.80%		64.7	43.69%
2MB	131.65	0	102.35	22.26%		91.95	30.16%		126.05	4.25%
4MB	225.55	0	213.2	5.48%		181.95	19.33%		200.8	10.97%
8MB	408.6	0	442.85	-8.38%		350.4	14.24%		365.15	10.63%
16MB	730.55	0	683.35	6.46%		735.5	-0.68%		698.3	4.41%
32MB	1682.6	0	1665.85	1.00%		1445.1	14.12%		1157.05	31.23%
64MB	3229.4	0	3463.2	-7.24%		2538.4	21.40%		1850.55	42.70%
128MB	5465.6	0	4816.2	11.88%		4448.3	18.61%		3528.25	35.45%
256MB	9526.9	0	10091.75 -5.93%		8514.5	10.63%		7978.2	16.26%
512MB	19029.05 0	20079.7	-5.52%		17059.05 10.35%		14713.65 22.68%
1024MB	37284.9	0	39453.75 -5.82%		32969.7	11.57%		28161.65 24.47%



Before			Patched	8 page		Patched	16 page		Patched	32 page
---------------------------------------------------------------------------------------
128KB	5.3	0	3.05	42.45%		2.65	50.00%		2.85	46.23%
256KB	13.15	0	8.2	37.64%		7.45	43.35%		7.95	39.54%
512KB	72.3	0	16.8	76.76%		17.7	75.52%		14.55	79.88%
1024KB	114.9	0	60.05	47.74%		93.65	18.49%		74.2	35.42%
2MB	131.65	0	119.8	9.00%		72.6	44.85%		84.7	35.66%
4MB	225.55	0	227.3	-0.78%		149.95	33.52%		153.6	31.90%
8MB	408.6	0	372.5	8.84%		304.95	25.37%		340.55	16.65%
16MB	730.55	0	772.2	-5.70%		567.4	22.33%		618.3	15.37%
32MB	1682.6	0	1217.7	27.63%		1098.25	34.73%		1168.7	30.54%
64MB	3229.4	0	2237.75	30.71%		1817.8	43.71%		1998.25	38.12%
128MB	5465.6	0	3504.25	35.89%		3466.75	36.57%		3159.35	42.20%
256MB	9526.9	0	7071.2	25.78%		7095.05	25.53%		6800.9	28.61%
512MB	19029.05 0	13640.85 28.32%		13098.2	31.17%		12778.1	32.85%
1024MB	37284.9	0	25897.15 30.54%		24875.6	33.28%		24179.3	35.15%



For one page allocation at once, this patchset makes allocator slower than
before (-5%). But, for more page allocation at once, this patchset makes
allocator faster than before greately.

At first, we can apply this feature to page cache readahead logic which
allocate single page repeatedly. I attach sample implementation to this
patchset(Patch 2-5).

Current implementation is not yet complete. Before polishing this feature,
I want to hear expert's opinion. I don't have any trouble with
current allocator, however, I think that we need this feature soon,
because device I/O is getting faster rapidly and allocator should
catch up this speed.

Thanks.

Joonsoo Kim (5):
  mm, page_alloc: support multiple pages allocation
  mm, page_alloc: introduce alloc_pages_exact_node_multiple()
  radix-tree: introduce radix_tree_[next/prev]_present()
  readahead: remove end range check
  readhead: support multiple pages allocation for readahead

 include/linux/gfp.h        |   16 ++++++++++--
 include/linux/pagemap.h    |   19 +++++++++-----
 include/linux/radix-tree.h |    4 +++
 lib/radix-tree.c           |   34 ++++++++++++++++++++++++
 mm/filemap.c               |   18 ++++++++-----
 mm/mempolicy.c             |    6 +++--
 mm/page_alloc.c            |   62 +++++++++++++++++++++++++++++++++++---------
 mm/readahead.c             |   46 ++++++++++++++++++++++----------
 8 files changed, 162 insertions(+), 43 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation
  2013-07-03  8:34 [RFC PATCH 0/5] Support multiple pages allocation Joonsoo Kim
@ 2013-07-03  8:34 ` Joonsoo Kim
  2013-07-03 15:57   ` Christoph Lameter
  2013-07-10 22:52   ` Dave Hansen
  2013-07-03  8:34 ` [RFC PATCH 2/5] mm, page_alloc: introduce alloc_pages_exact_node_multiple() Joonsoo Kim
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-03  8:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, David Rientjes, Glauber Costa, Johannes Weiner,
	KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Minchan Kim,
	Jiang Liu, linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

This patch introduces multiple pages allocation feature to buddy
allocator. Currently, there is no ability to allocate multiple
pages at once, so we should invoke single page allocation logic
repeatedly. This has some overheads like as function call
overhead with many arguments and overhead for finding proper
node and zone.

With this patchset, we can reduce these overheads. Device I/O is
getting faster rapidly and allocator should catch up this speed.
This patch help this situation.

In this patch, I introduce new arguments, nr_pages and pages, to
core function of allocator and try to allocate multiple pages
in first attempt(fast path). I think that multiple page allocation
is not valid for slow path, so current implementation consider
just fast path.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 0f615eb..8bfa87b 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -298,13 +298,15 @@ static inline void arch_alloc_page(struct page *page, int order) { }
 
 struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
-		       struct zonelist *zonelist, nodemask_t *nodemask);
+		       struct zonelist *zonelist, nodemask_t *nodemask,
+		       unsigned long *nr_pages, struct page **pages);
 
 static inline struct page *
 __alloc_pages(gfp_t gfp_mask, unsigned int order,
 		struct zonelist *zonelist)
 {
-	return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
+	return __alloc_pages_nodemask(gfp_mask, order,
+				zonelist, NULL, NULL, NULL);
 }
 
 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 7431001..b17e48c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2004,7 +2004,8 @@ retry_cpuset:
 	}
 	page = __alloc_pages_nodemask(gfp, order,
 				      policy_zonelist(gfp, pol, node),
-				      policy_nodemask(gfp, pol));
+				      policy_nodemask(gfp, pol),
+				      NULL, NULL);
 	if (unlikely(mpol_needs_cond_ref(pol)))
 		__mpol_put(pol);
 	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
@@ -2052,7 +2053,8 @@ retry_cpuset:
 	else
 		page = __alloc_pages_nodemask(gfp, order,
 				policy_zonelist(gfp, pol, numa_node_id()),
-				policy_nodemask(gfp, pol));
+				policy_nodemask(gfp, pol),
+				NULL, NULL);
 
 	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
 		goto retry_cpuset;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c3edb62..1fbbc9a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1846,7 +1846,8 @@ static inline void init_zone_allows_reclaim(int nid)
 static struct page *
 get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
 		struct zonelist *zonelist, int high_zoneidx, int alloc_flags,
-		struct zone *preferred_zone, int migratetype)
+		struct zone *preferred_zone, int migratetype,
+		unsigned long *nr_pages, struct page **pages)
 {
 	struct zoneref *z;
 	struct page *page = NULL;
@@ -1855,6 +1856,8 @@ get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
 	nodemask_t *allowednodes = NULL;/* zonelist_cache approximation */
 	int zlc_active = 0;		/* set if using zonelist_cache */
 	int did_zlc_setup = 0;		/* just call zlc_setup() one time */
+	unsigned long count = 0;
+	unsigned long mark;
 
 	classzone_idx = zone_idx(preferred_zone);
 zonelist_scan:
@@ -1902,7 +1905,6 @@ zonelist_scan:
 
 		BUILD_BUG_ON(ALLOC_NO_WATERMARKS < NR_WMARK);
 		if (!(alloc_flags & ALLOC_NO_WATERMARKS)) {
-			unsigned long mark;
 			int ret;
 
 			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
@@ -1966,10 +1968,30 @@ zonelist_scan:
 		}
 
 try_this_zone:
-		page = buffered_rmqueue(preferred_zone, zone, order,
-						gfp_mask, migratetype);
-		if (page)
+		do {
+			page = buffered_rmqueue(preferred_zone, zone, order,
+							gfp_mask, migratetype);
+			if (!page)
+				break;
+
+			if (!nr_pages) {
+				count++;
+				break;
+			}
+
+			pages[count++] = page;
+			if (count >= *nr_pages)
+				break;
+
+			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
+			if (!zone_watermark_ok(zone, order, mark,
+					classzone_idx, alloc_flags))
+				break;
+		} while (1);
+
+		if (count > 0)
 			break;
+
 this_zone_full:
 		if (IS_ENABLED(CONFIG_NUMA))
 			zlc_mark_zone_full(zonelist, z);
@@ -1981,6 +2003,12 @@ this_zone_full:
 		goto zonelist_scan;
 	}
 
+	if (nr_pages) {
+		*nr_pages = count;
+		if (count > 0)
+			page = pages[0];
+	}
+
 	if (page)
 		/*
 		 * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was
@@ -2125,7 +2153,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask,
 		order, zonelist, high_zoneidx,
 		ALLOC_WMARK_HIGH|ALLOC_CPUSET,
-		preferred_zone, migratetype);
+		preferred_zone, migratetype,
+		NULL, NULL);
 	if (page)
 		goto out;
 
@@ -2188,7 +2217,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		page = get_page_from_freelist(gfp_mask, nodemask,
 				order, zonelist, high_zoneidx,
 				alloc_flags & ~ALLOC_NO_WATERMARKS,
-				preferred_zone, migratetype);
+				preferred_zone, migratetype,
+				NULL, NULL);
 		if (page) {
 			preferred_zone->compact_blockskip_flush = false;
 			preferred_zone->compact_considered = 0;
@@ -2282,7 +2312,8 @@ retry:
 	page = get_page_from_freelist(gfp_mask, nodemask, order,
 					zonelist, high_zoneidx,
 					alloc_flags & ~ALLOC_NO_WATERMARKS,
-					preferred_zone, migratetype);
+					preferred_zone, migratetype,
+					NULL, NULL);
 
 	/*
 	 * If an allocation failed after direct reclaim, it could be because
@@ -2312,7 +2343,8 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order,
 	do {
 		page = get_page_from_freelist(gfp_mask, nodemask, order,
 			zonelist, high_zoneidx, ALLOC_NO_WATERMARKS,
-			preferred_zone, migratetype);
+			preferred_zone, migratetype,
+			NULL, NULL);
 
 		if (!page && gfp_mask & __GFP_NOFAIL)
 			wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
@@ -2449,7 +2481,8 @@ rebalance:
 	/* This is the last chance, in general, before the goto nopage. */
 	page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
 			high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS,
-			preferred_zone, migratetype);
+			preferred_zone, migratetype,
+			NULL, NULL);
 	if (page)
 		goto got_pg;
 
@@ -2598,7 +2631,8 @@ got_pg:
  */
 struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
-			struct zonelist *zonelist, nodemask_t *nodemask)
+			struct zonelist *zonelist, nodemask_t *nodemask,
+			unsigned long *nr_pages, struct page **pages)
 {
 	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
 	struct zone *preferred_zone;
@@ -2608,6 +2642,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET;
 	struct mem_cgroup *memcg = NULL;
 
+	VM_BUG_ON(nr_pages && !pages);
+
 	gfp_mask &= gfp_allowed_mask;
 
 	lockdep_trace_alloc(gfp_mask);
@@ -2647,9 +2683,11 @@ retry_cpuset:
 		alloc_flags |= ALLOC_CMA;
 #endif
 	/* First allocation attempt */
+	/* We only try to allocate nr_pages in first attempt */
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
 			zonelist, high_zoneidx, alloc_flags,
-			preferred_zone, migratetype);
+			preferred_zone, migratetype,
+			nr_pages, pages);
 	if (unlikely(!page)) {
 		/*
 		 * Runtime PM, block IO and its error handling path
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 2/5] mm, page_alloc: introduce alloc_pages_exact_node_multiple()
  2013-07-03  8:34 [RFC PATCH 0/5] Support multiple pages allocation Joonsoo Kim
  2013-07-03  8:34 ` [RFC PATCH 1/5] mm, page_alloc: support " Joonsoo Kim
@ 2013-07-03  8:34 ` Joonsoo Kim
  2013-07-03  8:34 ` [RFC PATCH 3/5] radix-tree: introduce radix_tree_[next/prev]_present() Joonsoo Kim
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-03  8:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, David Rientjes, Glauber Costa, Johannes Weiner,
	KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Minchan Kim,
	Jiang Liu, linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 8bfa87b..f8cde28 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -327,6 +327,16 @@ static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
 	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
 }
 
+static inline struct page *alloc_pages_exact_node_multiple(int nid,
+		gfp_t gfp_mask, unsigned long *nr_pages, struct page **pages)
+{
+	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
+
+	return __alloc_pages_nodemask(gfp_mask, 0,
+				node_zonelist(nid, gfp_mask), NULL,
+				nr_pages, pages);
+}
+
 #ifdef CONFIG_NUMA
 extern struct page *alloc_pages_current(gfp_t gfp_mask, unsigned order);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 3/5] radix-tree: introduce radix_tree_[next/prev]_present()
  2013-07-03  8:34 [RFC PATCH 0/5] Support multiple pages allocation Joonsoo Kim
  2013-07-03  8:34 ` [RFC PATCH 1/5] mm, page_alloc: support " Joonsoo Kim
  2013-07-03  8:34 ` [RFC PATCH 2/5] mm, page_alloc: introduce alloc_pages_exact_node_multiple() Joonsoo Kim
@ 2013-07-03  8:34 ` Joonsoo Kim
  2013-07-03  8:34 ` [RFC PATCH 4/5] readahead: remove end range check Joonsoo Kim
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-03  8:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, David Rientjes, Glauber Costa, Johannes Weiner,
	KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Minchan Kim,
	Jiang Liu, linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index ffc444c..045b325 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -230,6 +230,10 @@ unsigned long radix_tree_next_hole(struct radix_tree_root *root,
 				unsigned long index, unsigned long max_scan);
 unsigned long radix_tree_prev_hole(struct radix_tree_root *root,
 				unsigned long index, unsigned long max_scan);
+unsigned long radix_tree_next_present(struct radix_tree_root *root,
+				unsigned long index, unsigned long max_scan);
+unsigned long radix_tree_prev_present(struct radix_tree_root *root,
+				unsigned long index, unsigned long max_scan);
 int radix_tree_preload(gfp_t gfp_mask);
 void radix_tree_init(void);
 void *radix_tree_tag_set(struct radix_tree_root *root,
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index e796429..e02e3b8 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -984,6 +984,40 @@ unsigned long radix_tree_prev_hole(struct radix_tree_root *root,
 }
 EXPORT_SYMBOL(radix_tree_prev_hole);
 
+unsigned long radix_tree_next_present(struct radix_tree_root *root,
+				unsigned long index, unsigned long max_scan)
+{
+	unsigned long i;
+
+	for (i = 0; i < max_scan; i++) {
+		if (radix_tree_lookup(root, index))
+			break;
+		index++;
+		if (index == 0)
+			break;
+	}
+
+	return index;
+}
+EXPORT_SYMBOL(radix_tree_next_present);
+
+unsigned long radix_tree_prev_present(struct radix_tree_root *root,
+				   unsigned long index, unsigned long max_scan)
+{
+	unsigned long i;
+
+	for (i = 0; i < max_scan; i++) {
+		if (radix_tree_lookup(root, index))
+			break;
+		index--;
+		if (index == ULONG_MAX)
+			break;
+	}
+
+	return index;
+}
+EXPORT_SYMBOL(radix_tree_prev_present);
+
 /**
  *	radix_tree_gang_lookup - perform multiple lookup on a radix tree
  *	@root:		radix tree root
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 4/5] readahead: remove end range check
  2013-07-03  8:34 [RFC PATCH 0/5] Support multiple pages allocation Joonsoo Kim
                   ` (2 preceding siblings ...)
  2013-07-03  8:34 ` [RFC PATCH 3/5] radix-tree: introduce radix_tree_[next/prev]_present() Joonsoo Kim
@ 2013-07-03  8:34 ` Joonsoo Kim
  2013-07-03  8:34 ` [RFC PATCH 5/5] readhead: support multiple pages allocation for readahead Joonsoo Kim
  2013-07-03 15:28 ` [RFC PATCH 0/5] Support multiple pages allocation Michal Hocko
  5 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-03  8:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, David Rientjes, Glauber Costa, Johannes Weiner,
	KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Minchan Kim,
	Jiang Liu, linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/readahead.c b/mm/readahead.c
index daed28d..3932f28 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -166,6 +166,8 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 		goto out;
 
 	end_index = ((isize - 1) >> PAGE_CACHE_SHIFT);
+	if (offset + nr_to_read > end_index + 1)
+		nr_to_read = end_index - offset + 1;
 
 	/*
 	 * Preallocate as many pages as we will need.
@@ -173,9 +175,6 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 	for (page_idx = 0; page_idx < nr_to_read; page_idx++) {
 		pgoff_t page_offset = offset + page_idx;
 
-		if (page_offset > end_index)
-			break;
-
 		rcu_read_lock();
 		page = radix_tree_lookup(&mapping->page_tree, page_offset);
 		rcu_read_unlock();
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 5/5] readhead: support multiple pages allocation for readahead
  2013-07-03  8:34 [RFC PATCH 0/5] Support multiple pages allocation Joonsoo Kim
                   ` (3 preceding siblings ...)
  2013-07-03  8:34 ` [RFC PATCH 4/5] readahead: remove end range check Joonsoo Kim
@ 2013-07-03  8:34 ` Joonsoo Kim
  2013-07-03 15:28 ` [RFC PATCH 0/5] Support multiple pages allocation Michal Hocko
  5 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-03  8:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, David Rientjes, Glauber Costa, Johannes Weiner,
	KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Minchan Kim,
	Jiang Liu, linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e3dea75..eb1472c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -217,28 +217,33 @@ static inline void page_unfreeze_refs(struct page *page, int count)
 }
 
 #ifdef CONFIG_NUMA
-extern struct page *__page_cache_alloc(gfp_t gfp);
+extern struct page *__page_cache_alloc(gfp_t gfp,
+			unsigned long *nr_pages, struct page **pages);
 #else
-static inline struct page *__page_cache_alloc(gfp_t gfp)
+static inline struct page *__page_cache_alloc(gfp_t gfp,
+			unsigned long *nr_pages, struct page **pages)
 {
-	return alloc_pages(gfp, 0);
+	return alloc_pages_exact_node_multiple(numa_node_id(),
+						gfp, nr_pages, pages);
 }
 #endif
 
 static inline struct page *page_cache_alloc(struct address_space *x)
 {
-	return __page_cache_alloc(mapping_gfp_mask(x));
+	return __page_cache_alloc(mapping_gfp_mask(x), NULL, NULL);
 }
 
 static inline struct page *page_cache_alloc_cold(struct address_space *x)
 {
-	return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD);
+	return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD, NULL, NULL);
 }
 
-static inline struct page *page_cache_alloc_readahead(struct address_space *x)
+static inline struct page *page_cache_alloc_readahead(struct address_space *x,
+				unsigned long *nr_pages, struct page **pages)
 {
 	return __page_cache_alloc(mapping_gfp_mask(x) |
-				  __GFP_COLD | __GFP_NORETRY | __GFP_NOWARN);
+				  __GFP_COLD | __GFP_NORETRY | __GFP_NOWARN,
+				  nr_pages, pages);
 }
 
 typedef int filler_t(void *, struct page *);
diff --git a/mm/filemap.c b/mm/filemap.c
index 7905fe7..0bbfda9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -510,7 +510,8 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
 EXPORT_SYMBOL_GPL(add_to_page_cache_lru);
 
 #ifdef CONFIG_NUMA
-struct page *__page_cache_alloc(gfp_t gfp)
+struct page *__page_cache_alloc(gfp_t gfp,
+			unsigned long *nr_pages, struct page **pages)
 {
 	int n;
 	struct page *page;
@@ -520,12 +521,14 @@ struct page *__page_cache_alloc(gfp_t gfp)
 		do {
 			cpuset_mems_cookie = get_mems_allowed();
 			n = cpuset_mem_spread_node();
-			page = alloc_pages_exact_node(n, gfp, 0);
+			page = alloc_pages_exact_node_multiple(n,
+						gfp, nr_pages, pages);
 		} while (!put_mems_allowed(cpuset_mems_cookie) && !page);
 
 		return page;
 	}
-	return alloc_pages(gfp, 0);
+	return alloc_pages_exact_node_multiple(numa_node_id(), gfp,
+							nr_pages, pages);
 }
 EXPORT_SYMBOL(__page_cache_alloc);
 #endif
@@ -789,7 +792,7 @@ struct page *find_or_create_page(struct address_space *mapping,
 repeat:
 	page = find_lock_page(mapping, index);
 	if (!page) {
-		page = __page_cache_alloc(gfp_mask);
+		page = __page_cache_alloc(gfp_mask, NULL, NULL);
 		if (!page)
 			return NULL;
 		/*
@@ -1053,7 +1056,8 @@ grab_cache_page_nowait(struct address_space *mapping, pgoff_t index)
 		page_cache_release(page);
 		return NULL;
 	}
-	page = __page_cache_alloc(mapping_gfp_mask(mapping) & ~__GFP_FS);
+	page = __page_cache_alloc(mapping_gfp_mask(mapping) & ~__GFP_FS,
+								NULL, NULL);
 	if (page && add_to_page_cache_lru(page, mapping, index, GFP_NOFS)) {
 		page_cache_release(page);
 		page = NULL;
@@ -1806,7 +1810,7 @@ static struct page *__read_cache_page(struct address_space *mapping,
 repeat:
 	page = find_get_page(mapping, index);
 	if (!page) {
-		page = __page_cache_alloc(gfp | __GFP_COLD);
+		page = __page_cache_alloc(gfp | __GFP_COLD, NULL, NULL);
 		if (!page)
 			return ERR_PTR(-ENOMEM);
 		err = add_to_page_cache_lru(page, mapping, index, gfp);
@@ -2281,7 +2285,7 @@ repeat:
 	if (page)
 		goto found;
 
-	page = __page_cache_alloc(gfp_mask & ~gfp_notmask);
+	page = __page_cache_alloc(gfp_mask & ~gfp_notmask, NULL, NULL);
 	if (!page)
 		return NULL;
 	status = add_to_page_cache_lru(page, mapping, index,
diff --git a/mm/readahead.c b/mm/readahead.c
index 3932f28..3e2c377 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -157,40 +157,61 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 	struct inode *inode = mapping->host;
 	struct page *page;
 	unsigned long end_index;	/* The last page we want to read */
+	unsigned long end, i;
 	LIST_HEAD(page_pool);
 	int page_idx;
 	int ret = 0;
 	loff_t isize = i_size_read(inode);
+	struct page **pages;
 
 	if (isize == 0)
 		goto out;
 
 	end_index = ((isize - 1) >> PAGE_CACHE_SHIFT);
+	if (offset > end_index)
+		goto out;
+
 	if (offset + nr_to_read > end_index + 1)
 		nr_to_read = end_index - offset + 1;
 
+	pages = kmalloc(sizeof(struct page *) * nr_to_read, GFP_KERNEL);
+	if (!pages)
+		goto out;
+
 	/*
 	 * Preallocate as many pages as we will need.
 	 */
 	for (page_idx = 0; page_idx < nr_to_read; page_idx++) {
 		pgoff_t page_offset = offset + page_idx;
+		unsigned long nr_pages;
 
 		rcu_read_lock();
-		page = radix_tree_lookup(&mapping->page_tree, page_offset);
+		end = radix_tree_next_present(&mapping->page_tree,
+				page_offset, nr_to_read - page_idx);
 		rcu_read_unlock();
-		if (page)
+		nr_pages = end - page_offset;
+		if (!nr_pages)
 			continue;
 
-		page = page_cache_alloc_readahead(mapping);
-		if (!page)
-			break;
-		page->index = page_offset;
-		list_add(&page->lru, &page_pool);
-		if (page_idx == nr_to_read - lookahead_size)
-			SetPageReadahead(page);
-		ret++;
+		page_cache_alloc_readahead(mapping, &nr_pages, pages);
+		if (!nr_pages)
+			goto start_io;
+
+		for (i = 0; i < nr_pages; i++) {
+			page = pages[i];
+			page->index = page_offset + i;
+			list_add(&page->lru, &page_pool);
+			if (page_idx == nr_to_read - lookahead_size)
+				SetPageReadahead(page);
+			ret++;
+		}
+
+		/* Skip already checked page */
+		page_idx += nr_pages;
 	}
 
+start_io:
+	kfree(pages);
 	/*
 	 * Now start the IO.  We ignore I/O errors - if the page is not
 	 * uptodate then the caller will launch readpage again, and
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] Support multiple pages allocation
  2013-07-03  8:34 [RFC PATCH 0/5] Support multiple pages allocation Joonsoo Kim
                   ` (4 preceding siblings ...)
  2013-07-03  8:34 ` [RFC PATCH 5/5] readhead: support multiple pages allocation for readahead Joonsoo Kim
@ 2013-07-03 15:28 ` Michal Hocko
  2013-07-03 15:51   ` Zhang Yanfei
  5 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2013-07-03 15:28 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Mel Gorman, David Rientjes, Glauber Costa,
	Johannes Weiner, KOSAKI Motohiro, Rik van Riel, Hugh Dickins,
	Minchan Kim, Jiang Liu, linux-mm, linux-kernel, Joonsoo Kim

On Wed 03-07-13 17:34:15, Joonsoo Kim wrote:
[...]
> For one page allocation at once, this patchset makes allocator slower than
> before (-5%). 

Slowing down the most used path is a no-go. Where does this slow down
come from?

[...]
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] Support multiple pages allocation
  2013-07-03 15:28 ` [RFC PATCH 0/5] Support multiple pages allocation Michal Hocko
@ 2013-07-03 15:51   ` Zhang Yanfei
  2013-07-03 16:01     ` Zhang Yanfei
  0 siblings, 1 reply; 28+ messages in thread
From: Zhang Yanfei @ 2013-07-03 15:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Joonsoo Kim, Andrew Morton, Mel Gorman, David Rientjes,
	Glauber Costa, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm, linux-kernel,
	Joonsoo Kim

On 07/03/2013 11:28 PM, Michal Hocko wrote:
> On Wed 03-07-13 17:34:15, Joonsoo Kim wrote:
> [...]
>> For one page allocation at once, this patchset makes allocator slower than
>> before (-5%). 
> 
> Slowing down the most used path is a no-go. Where does this slow down
> come from?

I guess, it might be: for one page allocation at once, comparing to the original
code, this patch adds two parameters nr_pages and pages and will do extra checks
for the parameter nr_pages in the allocation path.

> 
> [...]


-- 
Thanks.
Zhang Yanfei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation
  2013-07-03  8:34 ` [RFC PATCH 1/5] mm, page_alloc: support " Joonsoo Kim
@ 2013-07-03 15:57   ` Christoph Lameter
  2013-07-04  4:29     ` Joonsoo Kim
  2013-07-10 22:52   ` Dave Hansen
  1 sibling, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2013-07-03 15:57 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Mel Gorman, David Rientjes, Glauber Costa,
	Johannes Weiner, KOSAKI Motohiro, Rik van Riel, Hugh Dickins,
	Minchan Kim, Jiang Liu, linux-mm, linux-kernel, Joonsoo Kim

On Wed, 3 Jul 2013, Joonsoo Kim wrote:

> @@ -298,13 +298,15 @@ static inline void arch_alloc_page(struct page *page, int order) { }
>
>  struct page *
>  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> -		       struct zonelist *zonelist, nodemask_t *nodemask);
> +		       struct zonelist *zonelist, nodemask_t *nodemask,
> +		       unsigned long *nr_pages, struct page **pages);
>

Add a separate function for the allocation of multiple pages instead?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] Support multiple pages allocation
  2013-07-03 15:51   ` Zhang Yanfei
@ 2013-07-03 16:01     ` Zhang Yanfei
  2013-07-04  4:24       ` Joonsoo Kim
  0 siblings, 1 reply; 28+ messages in thread
From: Zhang Yanfei @ 2013-07-03 16:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Joonsoo Kim, Andrew Morton, Mel Gorman, David Rientjes,
	Glauber Costa, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm, linux-kernel,
	Joonsoo Kim

On 07/03/2013 11:51 PM, Zhang Yanfei wrote:
> On 07/03/2013 11:28 PM, Michal Hocko wrote:
>> On Wed 03-07-13 17:34:15, Joonsoo Kim wrote:
>> [...]
>>> For one page allocation at once, this patchset makes allocator slower than
>>> before (-5%). 
>>
>> Slowing down the most used path is a no-go. Where does this slow down
>> come from?
> 
> I guess, it might be: for one page allocation at once, comparing to the original
> code, this patch adds two parameters nr_pages and pages and will do extra checks
> for the parameter nr_pages in the allocation path.
> 

If so, adding a separate path for the multiple allocations seems better.

>>
>> [...]
> 
> 


-- 
Thanks.
Zhang Yanfei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] Support multiple pages allocation
  2013-07-03 16:01     ` Zhang Yanfei
@ 2013-07-04  4:24       ` Joonsoo Kim
  2013-07-04 10:00         ` Michal Hocko
  0 siblings, 1 reply; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-04  4:24 UTC (permalink / raw)
  To: Zhang Yanfei
  Cc: Michal Hocko, Andrew Morton, Mel Gorman, David Rientjes,
	Glauber Costa, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm, linux-kernel

On Thu, Jul 04, 2013 at 12:01:43AM +0800, Zhang Yanfei wrote:
> On 07/03/2013 11:51 PM, Zhang Yanfei wrote:
> > On 07/03/2013 11:28 PM, Michal Hocko wrote:
> >> On Wed 03-07-13 17:34:15, Joonsoo Kim wrote:
> >> [...]
> >>> For one page allocation at once, this patchset makes allocator slower than
> >>> before (-5%). 
> >>
> >> Slowing down the most used path is a no-go. Where does this slow down
> >> come from?
> > 
> > I guess, it might be: for one page allocation at once, comparing to the original
> > code, this patch adds two parameters nr_pages and pages and will do extra checks
> > for the parameter nr_pages in the allocation path.
> > 
> 
> If so, adding a separate path for the multiple allocations seems better.

Hello, all.

I modify the code for optimizing one page allocation via likely macro.
I attach a new one at the end of this mail.

In this case, performance degradation for one page allocation at once is -2.5%.
I guess, remained overhead comes from two added parameters.
Is it unreasonable cost to support this new feature?
I think that readahead path is one of the most used path, so this penalty looks
endurable. And after supporting this feature, we can find more use cases.

I will try to add a new function for the multiple allocations and test it. But,
IMHO, adding a new function is not good idea, because we should duplicate
various checks which are already in __alloc_pages_nodemask and even if
we introduce a new function, we cannot avoid to pass two parameters
to get_page_from_freelist(), so slight performance degradation on
one page allocation is inevitable. Anyway, I will do and test it.

Thanks.

-------------------------------8<----------------------------
>From cee05ad3bcf1c5774fabf797b5dc8f78f812ca36 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Date: Wed, 26 Jun 2013 13:37:57 +0900
Subject: [PATCH] mm, page_alloc: support multiple pages allocation

This patch introduces multiple pages allocation feature to buddy
allocator. Currently, there is no ability to allocate multiple
pages at once, so we should invoke single page allocation logic
repeatedly. This has some overheads like as function call
overhead with many arguments and overhead for finding proper
node and zone.

With this patchset, we can reduce these overheads. Device I/O is
getting faster rapidly and allocator should catch up this speed.
This patch help this situation.

In this patch, I introduce new arguments, nr_pages and pages, to
core function of allocator and try to allocate multiple pages
in first attempt(fast path). I think that multiple page allocation
is not valid for slow path, so current implementation consider
just fast path.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 0f615eb..8bfa87b 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -298,13 +298,15 @@ static inline void arch_alloc_page(struct page *page, int order) { }
 
 struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
-		       struct zonelist *zonelist, nodemask_t *nodemask);
+		       struct zonelist *zonelist, nodemask_t *nodemask,
+		       unsigned long *nr_pages, struct page **pages);
 
 static inline struct page *
 __alloc_pages(gfp_t gfp_mask, unsigned int order,
 		struct zonelist *zonelist)
 {
-	return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
+	return __alloc_pages_nodemask(gfp_mask, order,
+				zonelist, NULL, NULL, NULL);
 }
 
 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 7431001..b17e48c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2004,7 +2004,8 @@ retry_cpuset:
 	}
 	page = __alloc_pages_nodemask(gfp, order,
 				      policy_zonelist(gfp, pol, node),
-				      policy_nodemask(gfp, pol));
+				      policy_nodemask(gfp, pol),
+				      NULL, NULL);
 	if (unlikely(mpol_needs_cond_ref(pol)))
 		__mpol_put(pol);
 	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
@@ -2052,7 +2053,8 @@ retry_cpuset:
 	else
 		page = __alloc_pages_nodemask(gfp, order,
 				policy_zonelist(gfp, pol, numa_node_id()),
-				policy_nodemask(gfp, pol));
+				policy_nodemask(gfp, pol),
+				NULL, NULL);
 
 	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
 		goto retry_cpuset;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c3edb62..0ba9f63 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1846,7 +1846,8 @@ static inline void init_zone_allows_reclaim(int nid)
 static struct page *
 get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
 		struct zonelist *zonelist, int high_zoneidx, int alloc_flags,
-		struct zone *preferred_zone, int migratetype)
+		struct zone *preferred_zone, int migratetype,
+		unsigned long *nr_pages, struct page **pages)
 {
 	struct zoneref *z;
 	struct page *page = NULL;
@@ -1968,8 +1969,33 @@ zonelist_scan:
 try_this_zone:
 		page = buffered_rmqueue(preferred_zone, zone, order,
 						gfp_mask, migratetype);
-		if (page)
+		if (page) {
+			unsigned long mark;
+			unsigned long count;
+			unsigned long nr;
+
+			if (likely(!nr_pages))
+				break;
+
+			count = 0;
+			pages[count++] = page;
+			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
+			nr = *nr_pages;
+			while (count < nr) {
+				if (!zone_watermark_ok(zone, order, mark,
+					classzone_idx, alloc_flags))
+					break;
+				page = buffered_rmqueue(preferred_zone, zone,
+						order, gfp_mask, migratetype);
+				if (!page)
+					break;
+				pages[count++] = page;
+			}
+			*nr_pages = count;
+			page = pages[0];
 			break;
+		}
+
 this_zone_full:
 		if (IS_ENABLED(CONFIG_NUMA))
 			zlc_mark_zone_full(zonelist, z);
@@ -2125,7 +2151,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask,
 		order, zonelist, high_zoneidx,
 		ALLOC_WMARK_HIGH|ALLOC_CPUSET,
-		preferred_zone, migratetype);
+		preferred_zone, migratetype,
+		NULL, NULL);
 	if (page)
 		goto out;
 
@@ -2188,7 +2215,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		page = get_page_from_freelist(gfp_mask, nodemask,
 				order, zonelist, high_zoneidx,
 				alloc_flags & ~ALLOC_NO_WATERMARKS,
-				preferred_zone, migratetype);
+				preferred_zone, migratetype,
+				NULL, NULL);
 		if (page) {
 			preferred_zone->compact_blockskip_flush = false;
 			preferred_zone->compact_considered = 0;
@@ -2282,7 +2310,8 @@ retry:
 	page = get_page_from_freelist(gfp_mask, nodemask, order,
 					zonelist, high_zoneidx,
 					alloc_flags & ~ALLOC_NO_WATERMARKS,
-					preferred_zone, migratetype);
+					preferred_zone, migratetype,
+					NULL, NULL);
 
 	/*
 	 * If an allocation failed after direct reclaim, it could be because
@@ -2312,7 +2341,8 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order,
 	do {
 		page = get_page_from_freelist(gfp_mask, nodemask, order,
 			zonelist, high_zoneidx, ALLOC_NO_WATERMARKS,
-			preferred_zone, migratetype);
+			preferred_zone, migratetype,
+			NULL, NULL);
 
 		if (!page && gfp_mask & __GFP_NOFAIL)
 			wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
@@ -2449,7 +2479,8 @@ rebalance:
 	/* This is the last chance, in general, before the goto nopage. */
 	page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
 			high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS,
-			preferred_zone, migratetype);
+			preferred_zone, migratetype,
+			NULL, NULL);
 	if (page)
 		goto got_pg;
 
@@ -2598,7 +2629,8 @@ got_pg:
  */
 struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
-			struct zonelist *zonelist, nodemask_t *nodemask)
+			struct zonelist *zonelist, nodemask_t *nodemask,
+			unsigned long *nr_pages, struct page **pages)
 {
 	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
 	struct zone *preferred_zone;
@@ -2647,9 +2679,11 @@ retry_cpuset:
 		alloc_flags |= ALLOC_CMA;
 #endif
 	/* First allocation attempt */
+	/* We only try to allocate nr_pages in first attempt */
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
 			zonelist, high_zoneidx, alloc_flags,
-			preferred_zone, migratetype);
+			preferred_zone, migratetype,
+			nr_pages, pages);
 	if (unlikely(!page)) {
 		/*
 		 * Runtime PM, block IO and its error handling path
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation
  2013-07-03 15:57   ` Christoph Lameter
@ 2013-07-04  4:29     ` Joonsoo Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-04  4:29 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Mel Gorman, David Rientjes, Glauber Costa,
	Johannes Weiner, KOSAKI Motohiro, Rik van Riel, Hugh Dickins,
	Minchan Kim, Jiang Liu, linux-mm, linux-kernel

On Wed, Jul 03, 2013 at 03:57:45PM +0000, Christoph Lameter wrote:
> On Wed, 3 Jul 2013, Joonsoo Kim wrote:
> 
> > @@ -298,13 +298,15 @@ static inline void arch_alloc_page(struct page *page, int order) { }
> >
> >  struct page *
> >  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> > -		       struct zonelist *zonelist, nodemask_t *nodemask);
> > +		       struct zonelist *zonelist, nodemask_t *nodemask,
> > +		       unsigned long *nr_pages, struct page **pages);
> >
> 
> Add a separate function for the allocation of multiple pages instead?

Hello.

I will try it, but, I don't like to implement a separate function.
Please reference my reply to [0/5].

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] Support multiple pages allocation
  2013-07-04  4:24       ` Joonsoo Kim
@ 2013-07-04 10:00         ` Michal Hocko
  2013-07-10  0:31           ` Joonsoo Kim
  0 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2013-07-04 10:00 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Zhang Yanfei, Andrew Morton, Mel Gorman, David Rientjes,
	Glauber Costa, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm, linux-kernel

On Thu 04-07-13 13:24:50, Joonsoo Kim wrote:
> On Thu, Jul 04, 2013 at 12:01:43AM +0800, Zhang Yanfei wrote:
> > On 07/03/2013 11:51 PM, Zhang Yanfei wrote:
> > > On 07/03/2013 11:28 PM, Michal Hocko wrote:
> > >> On Wed 03-07-13 17:34:15, Joonsoo Kim wrote:
> > >> [...]
> > >>> For one page allocation at once, this patchset makes allocator slower than
> > >>> before (-5%). 
> > >>
> > >> Slowing down the most used path is a no-go. Where does this slow down
> > >> come from?
> > > 
> > > I guess, it might be: for one page allocation at once, comparing to the original
> > > code, this patch adds two parameters nr_pages and pages and will do extra checks
> > > for the parameter nr_pages in the allocation path.
> > > 
> > 
> > If so, adding a separate path for the multiple allocations seems better.
> 
> Hello, all.
> 
> I modify the code for optimizing one page allocation via likely macro.
> I attach a new one at the end of this mail.
> 
> In this case, performance degradation for one page allocation at once is -2.5%.
> I guess, remained overhead comes from two added parameters.
> Is it unreasonable cost to support this new feature?

Which benchmark you are using for this testing?

> I think that readahead path is one of the most used path, so this penalty looks
> endurable. And after supporting this feature, we can find more use cases.

What about page faults? I would oppose that page faults are _much_ more
frequent than read ahead so you really cannot slow them down.

[...]
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] Support multiple pages allocation
  2013-07-04 10:00         ` Michal Hocko
@ 2013-07-10  0:31           ` Joonsoo Kim
  2013-07-10  1:20             ` Zhang Yanfei
  2013-07-10  9:17             ` Michal Hocko
  0 siblings, 2 replies; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-10  0:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Zhang Yanfei, Andrew Morton, Mel Gorman, David Rientjes,
	Glauber Costa, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm, linux-kernel

On Thu, Jul 04, 2013 at 12:00:44PM +0200, Michal Hocko wrote:
> On Thu 04-07-13 13:24:50, Joonsoo Kim wrote:
> > On Thu, Jul 04, 2013 at 12:01:43AM +0800, Zhang Yanfei wrote:
> > > On 07/03/2013 11:51 PM, Zhang Yanfei wrote:
> > > > On 07/03/2013 11:28 PM, Michal Hocko wrote:
> > > >> On Wed 03-07-13 17:34:15, Joonsoo Kim wrote:
> > > >> [...]
> > > >>> For one page allocation at once, this patchset makes allocator slower than
> > > >>> before (-5%). 
> > > >>
> > > >> Slowing down the most used path is a no-go. Where does this slow down
> > > >> come from?
> > > > 
> > > > I guess, it might be: for one page allocation at once, comparing to the original
> > > > code, this patch adds two parameters nr_pages and pages and will do extra checks
> > > > for the parameter nr_pages in the allocation path.
> > > > 
> > > 
> > > If so, adding a separate path for the multiple allocations seems better.
> > 
> > Hello, all.
> > 
> > I modify the code for optimizing one page allocation via likely macro.
> > I attach a new one at the end of this mail.
> > 
> > In this case, performance degradation for one page allocation at once is -2.5%.
> > I guess, remained overhead comes from two added parameters.
> > Is it unreasonable cost to support this new feature?
> 
> Which benchmark you are using for this testing?

I use my own module which do allocation repeatedly.

> 
> > I think that readahead path is one of the most used path, so this penalty looks
> > endurable. And after supporting this feature, we can find more use cases.
> 
> What about page faults? I would oppose that page faults are _much_ more
> frequent than read ahead so you really cannot slow them down.

You mean page faults for anon?
Yes. I also think that it is much more frequent than read ahead.
Before futher discussion, I will try to add a separate path
for the multiple allocations.

Thanks.

> 
> [...]
> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] Support multiple pages allocation
  2013-07-10  0:31           ` Joonsoo Kim
@ 2013-07-10  1:20             ` Zhang Yanfei
  2013-07-10  9:56               ` Joonsoo Kim
  2013-07-10  9:17             ` Michal Hocko
  1 sibling, 1 reply; 28+ messages in thread
From: Zhang Yanfei @ 2013-07-10  1:20 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Michal Hocko, Zhang Yanfei, Andrew Morton, Mel Gorman,
	David Rientjes, Glauber Costa, Johannes Weiner, KOSAKI Motohiro,
	Rik van Riel, Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm,
	linux-kernel

于 2013/7/10 8:31, Joonsoo Kim 写道:
> On Thu, Jul 04, 2013 at 12:00:44PM +0200, Michal Hocko wrote:
>> On Thu 04-07-13 13:24:50, Joonsoo Kim wrote:
>>> On Thu, Jul 04, 2013 at 12:01:43AM +0800, Zhang Yanfei wrote:
>>>> On 07/03/2013 11:51 PM, Zhang Yanfei wrote:
>>>>> On 07/03/2013 11:28 PM, Michal Hocko wrote:
>>>>>> On Wed 03-07-13 17:34:15, Joonsoo Kim wrote:
>>>>>> [...]
>>>>>>> For one page allocation at once, this patchset makes allocator slower than
>>>>>>> before (-5%). 
>>>>>>
>>>>>> Slowing down the most used path is a no-go. Where does this slow down
>>>>>> come from?
>>>>>
>>>>> I guess, it might be: for one page allocation at once, comparing to the original
>>>>> code, this patch adds two parameters nr_pages and pages and will do extra checks
>>>>> for the parameter nr_pages in the allocation path.
>>>>>
>>>>
>>>> If so, adding a separate path for the multiple allocations seems better.
>>>
>>> Hello, all.
>>>
>>> I modify the code for optimizing one page allocation via likely macro.
>>> I attach a new one at the end of this mail.
>>>
>>> In this case, performance degradation for one page allocation at once is -2.5%.
>>> I guess, remained overhead comes from two added parameters.
>>> Is it unreasonable cost to support this new feature?
>>
>> Which benchmark you are using for this testing?
> 
> I use my own module which do allocation repeatedly.
> 
>>
>>> I think that readahead path is one of the most used path, so this penalty looks
>>> endurable. And after supporting this feature, we can find more use cases.
>>
>> What about page faults? I would oppose that page faults are _much_ more
>> frequent than read ahead so you really cannot slow them down.
> 
> You mean page faults for anon?
> Yes. I also think that it is much more frequent than read ahead.
> Before futher discussion, I will try to add a separate path
> for the multiple allocations.

Some days ago, I was thinking that this multiple allocation behaviour
may be useful for vmalloc allocations. So I think it is worth trying.

> 
> Thanks.
> 
>>
>> [...]
>> -- 
>> Michal Hocko
>> SUSE Labs
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


-- 
Thanks.
Zhang Yanfei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] Support multiple pages allocation
  2013-07-10  0:31           ` Joonsoo Kim
  2013-07-10  1:20             ` Zhang Yanfei
@ 2013-07-10  9:17             ` Michal Hocko
  2013-07-10  9:55               ` Joonsoo Kim
  1 sibling, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2013-07-10  9:17 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Zhang Yanfei, Andrew Morton, Mel Gorman, David Rientjes,
	Glauber Costa, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm, linux-kernel

On Wed 10-07-13 09:31:42, Joonsoo Kim wrote:
> On Thu, Jul 04, 2013 at 12:00:44PM +0200, Michal Hocko wrote:
> > On Thu 04-07-13 13:24:50, Joonsoo Kim wrote:
> > > On Thu, Jul 04, 2013 at 12:01:43AM +0800, Zhang Yanfei wrote:
> > > > On 07/03/2013 11:51 PM, Zhang Yanfei wrote:
> > > > > On 07/03/2013 11:28 PM, Michal Hocko wrote:
> > > > >> On Wed 03-07-13 17:34:15, Joonsoo Kim wrote:
> > > > >> [...]
> > > > >>> For one page allocation at once, this patchset makes allocator slower than
> > > > >>> before (-5%). 
> > > > >>
> > > > >> Slowing down the most used path is a no-go. Where does this slow down
> > > > >> come from?
> > > > > 
> > > > > I guess, it might be: for one page allocation at once, comparing to the original
> > > > > code, this patch adds two parameters nr_pages and pages and will do extra checks
> > > > > for the parameter nr_pages in the allocation path.
> > > > > 
> > > > 
> > > > If so, adding a separate path for the multiple allocations seems better.
> > > 
> > > Hello, all.
> > > 
> > > I modify the code for optimizing one page allocation via likely macro.
> > > I attach a new one at the end of this mail.
> > > 
> > > In this case, performance degradation for one page allocation at once is -2.5%.
> > > I guess, remained overhead comes from two added parameters.
> > > Is it unreasonable cost to support this new feature?
> > 
> > Which benchmark you are using for this testing?
> 
> I use my own module which do allocation repeatedly.

I am not sure this microbenchmark will tell us much. Allocations are
usually not short lived so the longer time might get amortized.
If you want to use the multi page allocation for read ahead then try to
model your numbers on read-ahead workloads.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] Support multiple pages allocation
  2013-07-10  9:17             ` Michal Hocko
@ 2013-07-10  9:55               ` Joonsoo Kim
  2013-07-10 11:27                 ` Michal Hocko
  0 siblings, 1 reply; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-10  9:55 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Zhang Yanfei, Andrew Morton, Mel Gorman, David Rientjes,
	Glauber Costa, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm, linux-kernel

On Wed, Jul 10, 2013 at 11:17:03AM +0200, Michal Hocko wrote:
> On Wed 10-07-13 09:31:42, Joonsoo Kim wrote:
> > On Thu, Jul 04, 2013 at 12:00:44PM +0200, Michal Hocko wrote:
> > > On Thu 04-07-13 13:24:50, Joonsoo Kim wrote:
> > > > On Thu, Jul 04, 2013 at 12:01:43AM +0800, Zhang Yanfei wrote:
> > > > > On 07/03/2013 11:51 PM, Zhang Yanfei wrote:
> > > > > > On 07/03/2013 11:28 PM, Michal Hocko wrote:
> > > > > >> On Wed 03-07-13 17:34:15, Joonsoo Kim wrote:
> > > > > >> [...]
> > > > > >>> For one page allocation at once, this patchset makes allocator slower than
> > > > > >>> before (-5%). 
> > > > > >>
> > > > > >> Slowing down the most used path is a no-go. Where does this slow down
> > > > > >> come from?
> > > > > > 
> > > > > > I guess, it might be: for one page allocation at once, comparing to the original
> > > > > > code, this patch adds two parameters nr_pages and pages and will do extra checks
> > > > > > for the parameter nr_pages in the allocation path.
> > > > > > 
> > > > > 
> > > > > If so, adding a separate path for the multiple allocations seems better.
> > > > 
> > > > Hello, all.
> > > > 
> > > > I modify the code for optimizing one page allocation via likely macro.
> > > > I attach a new one at the end of this mail.
> > > > 
> > > > In this case, performance degradation for one page allocation at once is -2.5%.
> > > > I guess, remained overhead comes from two added parameters.
> > > > Is it unreasonable cost to support this new feature?
> > > 
> > > Which benchmark you are using for this testing?
> > 
> > I use my own module which do allocation repeatedly.
> 
> I am not sure this microbenchmark will tell us much. Allocations are
> usually not short lived so the longer time might get amortized.
> If you want to use the multi page allocation for read ahead then try to
> model your numbers on read-ahead workloads.

Of couse. In later, I will get the result on read-ahead workloads or
vmalloc workload which is recommended by Zhang.

I think, without this microbenchmark, we cannot know this modification's
performance effect to single page allocation accurately. Because the impact
to single page allocation is relatively small and it is easily hidden by
other factors.

Now, I tried several implementation for this feature and found that
separate path also makes single page allocation slower (-1.0~-1.5%).
I didn't find any reason except the fact that
text size of page_alloc.o is 1500 bytes more than before.

Before
   text    data     bss     dec     hex filename
  34466    1389     640   36495    8e8f mm/page_alloc.o

sep
   text    data     bss     dec     hex filename
  36074    1413     640   38127    94ef mm/page_alloc.o

Not yet posted implementation which pass two more arguments to
__alloc_pages_nodemask() also makes single page allocation
(-1.0~-1.5%) slower. So in later, I will work with this implementation,
not separate path implementation.

Thanks for comment!

> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] Support multiple pages allocation
  2013-07-10  1:20             ` Zhang Yanfei
@ 2013-07-10  9:56               ` Joonsoo Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-10  9:56 UTC (permalink / raw)
  To: Zhang Yanfei
  Cc: Michal Hocko, Zhang Yanfei, Andrew Morton, Mel Gorman,
	David Rientjes, Glauber Costa, Johannes Weiner, KOSAKI Motohiro,
	Rik van Riel, Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm,
	linux-kernel

On Wed, Jul 10, 2013 at 09:20:27AM +0800, Zhang Yanfei wrote:
> 于 2013/7/10 8:31, Joonsoo Kim 写道:
> > On Thu, Jul 04, 2013 at 12:00:44PM +0200, Michal Hocko wrote:
> >> On Thu 04-07-13 13:24:50, Joonsoo Kim wrote:
> >>> On Thu, Jul 04, 2013 at 12:01:43AM +0800, Zhang Yanfei wrote:
> >>>> On 07/03/2013 11:51 PM, Zhang Yanfei wrote:
> >>>>> On 07/03/2013 11:28 PM, Michal Hocko wrote:
> >>>>>> On Wed 03-07-13 17:34:15, Joonsoo Kim wrote:
> >>>>>> [...]
> >>>>>>> For one page allocation at once, this patchset makes allocator slower than
> >>>>>>> before (-5%). 
> >>>>>>
> >>>>>> Slowing down the most used path is a no-go. Where does this slow down
> >>>>>> come from?
> >>>>>
> >>>>> I guess, it might be: for one page allocation at once, comparing to the original
> >>>>> code, this patch adds two parameters nr_pages and pages and will do extra checks
> >>>>> for the parameter nr_pages in the allocation path.
> >>>>>
> >>>>
> >>>> If so, adding a separate path for the multiple allocations seems better.
> >>>
> >>> Hello, all.
> >>>
> >>> I modify the code for optimizing one page allocation via likely macro.
> >>> I attach a new one at the end of this mail.
> >>>
> >>> In this case, performance degradation for one page allocation at once is -2.5%.
> >>> I guess, remained overhead comes from two added parameters.
> >>> Is it unreasonable cost to support this new feature?
> >>
> >> Which benchmark you are using for this testing?
> > 
> > I use my own module which do allocation repeatedly.
> > 
> >>
> >>> I think that readahead path is one of the most used path, so this penalty looks
> >>> endurable. And after supporting this feature, we can find more use cases.
> >>
> >> What about page faults? I would oppose that page faults are _much_ more
> >> frequent than read ahead so you really cannot slow them down.
> > 
> > You mean page faults for anon?
> > Yes. I also think that it is much more frequent than read ahead.
> > Before futher discussion, I will try to add a separate path
> > for the multiple allocations.
> 
> Some days ago, I was thinking that this multiple allocation behaviour
> may be useful for vmalloc allocations. So I think it is worth trying.

Yeh! I think so!

Thanks.

> 
> > 
> > Thanks.
> > 
> >>
> >> [...]
> >> -- 
> >> Michal Hocko
> >> SUSE Labs
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 
> 
> 
> -- 
> Thanks.
> Zhang Yanfei
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] Support multiple pages allocation
  2013-07-10  9:55               ` Joonsoo Kim
@ 2013-07-10 11:27                 ` Michal Hocko
  2013-07-11  1:05                   ` Joonsoo Kim
  0 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2013-07-10 11:27 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Zhang Yanfei, Andrew Morton, Mel Gorman, David Rientjes,
	Glauber Costa, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm, linux-kernel

On Wed 10-07-13 18:55:33, Joonsoo Kim wrote:
> On Wed, Jul 10, 2013 at 11:17:03AM +0200, Michal Hocko wrote:
> > On Wed 10-07-13 09:31:42, Joonsoo Kim wrote:
> > > On Thu, Jul 04, 2013 at 12:00:44PM +0200, Michal Hocko wrote:
[...]
> > > > Which benchmark you are using for this testing?
> > > 
> > > I use my own module which do allocation repeatedly.
> > 
> > I am not sure this microbenchmark will tell us much. Allocations are
> > usually not short lived so the longer time might get amortized.
> > If you want to use the multi page allocation for read ahead then try to
> > model your numbers on read-ahead workloads.
> 
> Of couse. In later, I will get the result on read-ahead workloads or
> vmalloc workload which is recommended by Zhang.
> 
> I think, without this microbenchmark, we cannot know this modification's
> performance effect to single page allocation accurately. Because the impact
> to single page allocation is relatively small and it is easily hidden by
> other factors.

The main thing is whether the numbers you get from an artificial
microbenchmark matter at all. You might see a regression which cannot be
hit in practice because other effects are of magnitude more significant.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation
  2013-07-03  8:34 ` [RFC PATCH 1/5] mm, page_alloc: support " Joonsoo Kim
  2013-07-03 15:57   ` Christoph Lameter
@ 2013-07-10 22:52   ` Dave Hansen
  2013-07-11  1:02     ` Joonsoo Kim
  1 sibling, 1 reply; 28+ messages in thread
From: Dave Hansen @ 2013-07-10 22:52 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Mel Gorman, David Rientjes, Glauber Costa,
	Johannes Weiner, KOSAKI Motohiro, Rik van Riel, Hugh Dickins,
	Minchan Kim, Jiang Liu, linux-mm, linux-kernel, Joonsoo Kim

On 07/03/2013 01:34 AM, Joonsoo Kim wrote:
> -		if (page)
> +		do {
> +			page = buffered_rmqueue(preferred_zone, zone, order,
> +							gfp_mask, migratetype);
> +			if (!page)
> +				break;
> +
> +			if (!nr_pages) {
> +				count++;
> +				break;
> +			}
> +
> +			pages[count++] = page;
> +			if (count >= *nr_pages)
> +				break;
> +
> +			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
> +			if (!zone_watermark_ok(zone, order, mark,
> +					classzone_idx, alloc_flags))
> +				break;
> +		} while (1);

I'm really surprised this works as well as it does.  Calling
buffered_rmqueue() a bunch of times enables/disables interrupts a bunch
of times, and mucks with the percpu pages lists a whole bunch.
buffered_rmqueue() is really meant for _single_ pages, not to be called
a bunch of times in a row.

Why not just do a single rmqueue_bulk() call?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation
  2013-07-10 22:52   ` Dave Hansen
@ 2013-07-11  1:02     ` Joonsoo Kim
  2013-07-11  5:38       ` Dave Hansen
  0 siblings, 1 reply; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-11  1:02 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Mel Gorman, David Rientjes, Glauber Costa,
	Johannes Weiner, KOSAKI Motohiro, Rik van Riel, Hugh Dickins,
	Minchan Kim, Jiang Liu, linux-mm, linux-kernel

On Wed, Jul 10, 2013 at 03:52:42PM -0700, Dave Hansen wrote:
> On 07/03/2013 01:34 AM, Joonsoo Kim wrote:
> > -		if (page)
> > +		do {
> > +			page = buffered_rmqueue(preferred_zone, zone, order,
> > +							gfp_mask, migratetype);
> > +			if (!page)
> > +				break;
> > +
> > +			if (!nr_pages) {
> > +				count++;
> > +				break;
> > +			}
> > +
> > +			pages[count++] = page;
> > +			if (count >= *nr_pages)
> > +				break;
> > +
> > +			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
> > +			if (!zone_watermark_ok(zone, order, mark,
> > +					classzone_idx, alloc_flags))
> > +				break;
> > +		} while (1);
> 
> I'm really surprised this works as well as it does.  Calling
> buffered_rmqueue() a bunch of times enables/disables interrupts a bunch
> of times, and mucks with the percpu pages lists a whole bunch.
> buffered_rmqueue() is really meant for _single_ pages, not to be called
> a bunch of times in a row.
> 
> Why not just do a single rmqueue_bulk() call?

Hello, Dave.

There are some reasons why I implement the feature in this way.

rmqueue_bulk() needs a zone lock. If we allocate not so many pages,
for example, 2 or 3 pages, it can have much more overhead rather than
allocationg 1 page multiple times. So, IMHO, it is better that
multiple pages allocation is supported on top of percpu pages list.

And I think that enables/disables interrupts a bunch of times help
to avoid a latency problem. If we disable interrupts until the whole works
is finished, interrupts can be handled too lately.
free_hot_cold_page_list() already do enables/disalbed interrupts a bunch of
times.

Thanks for helpful comment!

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] Support multiple pages allocation
  2013-07-10 11:27                 ` Michal Hocko
@ 2013-07-11  1:05                   ` Joonsoo Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-11  1:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Zhang Yanfei, Andrew Morton, Mel Gorman, David Rientjes,
	Glauber Costa, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm, linux-kernel

On Wed, Jul 10, 2013 at 01:27:37PM +0200, Michal Hocko wrote:
> On Wed 10-07-13 18:55:33, Joonsoo Kim wrote:
> > On Wed, Jul 10, 2013 at 11:17:03AM +0200, Michal Hocko wrote:
> > > On Wed 10-07-13 09:31:42, Joonsoo Kim wrote:
> > > > On Thu, Jul 04, 2013 at 12:00:44PM +0200, Michal Hocko wrote:
> [...]
> > > > > Which benchmark you are using for this testing?
> > > > 
> > > > I use my own module which do allocation repeatedly.
> > > 
> > > I am not sure this microbenchmark will tell us much. Allocations are
> > > usually not short lived so the longer time might get amortized.
> > > If you want to use the multi page allocation for read ahead then try to
> > > model your numbers on read-ahead workloads.
> > 
> > Of couse. In later, I will get the result on read-ahead workloads or
> > vmalloc workload which is recommended by Zhang.
> > 
> > I think, without this microbenchmark, we cannot know this modification's
> > performance effect to single page allocation accurately. Because the impact
> > to single page allocation is relatively small and it is easily hidden by
> > other factors.
> 
> The main thing is whether the numbers you get from an artificial
> microbenchmark matter at all. You might see a regression which cannot be
> hit in practice because other effects are of magnitude more significant.

Okay. I will keep this in mind.

Thanks for your comment.

> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation
  2013-07-11  1:02     ` Joonsoo Kim
@ 2013-07-11  5:38       ` Dave Hansen
  2013-07-11  6:12         ` Joonsoo Kim
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Hansen @ 2013-07-11  5:38 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Dave Hansen, Andrew Morton, Mel Gorman, David Rientjes,
	Glauber Costa, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm, linux-kernel

On 07/10/2013 06:02 PM, Joonsoo Kim wrote:
> On Wed, Jul 10, 2013 at 03:52:42PM -0700, Dave Hansen wrote:
>> On 07/03/2013 01:34 AM, Joonsoo Kim wrote:
>>> -		if (page)
>>> +		do {
>>> +			page = buffered_rmqueue(preferred_zone, zone, order,
>>> +							gfp_mask, migratetype);
>>> +			if (!page)
>>> +				break;
>>> +
>>> +			if (!nr_pages) {
>>> +				count++;
>>> +				break;
>>> +			}
>>> +
>>> +			pages[count++] = page;
>>> +			if (count >= *nr_pages)
>>> +				break;
>>> +
>>> +			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
>>> +			if (!zone_watermark_ok(zone, order, mark,
>>> +					classzone_idx, alloc_flags))
>>> +				break;
>>> +		} while (1);
>>
>> I'm really surprised this works as well as it does.  Calling
>> buffered_rmqueue() a bunch of times enables/disables interrupts a bunch
>> of times, and mucks with the percpu pages lists a whole bunch.
>> buffered_rmqueue() is really meant for _single_ pages, not to be called
>> a bunch of times in a row.
>>
>> Why not just do a single rmqueue_bulk() call?
> 
> rmqueue_bulk() needs a zone lock. If we allocate not so many pages,
> for example, 2 or 3 pages, it can have much more overhead rather than
> allocationg 1 page multiple times. So, IMHO, it is better that
> multiple pages allocation is supported on top of percpu pages list.

It makes _zero_ sense to be doing a number of buffered_rmqueue() calls
that are approaching the size of the percpu pages batches.  If you end
up doing that, you pay both the overhead in manipulating both the percpu
page data _and_ the buddy lists.

You're probably right for small numbers of pages.  But, if we're talking
about things that are more than, say, 100 pages (isn't the pcp batch
size clamped to 128 4k pages?) you surely don't want to be doing
buffered_rmqueue().

I'd also like to see some scalability numbers on this.  How do your
tests look when all the CPUs on the system are hammering away?

> And I think that enables/disables interrupts a bunch of times help
> to avoid a latency problem. If we disable interrupts until the whole works
> is finished, interrupts can be handled too lately.
> free_hot_cold_page_list() already do enables/disalbed interrupts a bunch of
> times.

I don't think interrupt latency is going to be a problem on the scale
we're talking about here.  There are much, much, much worse offenders in
the kernel.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation
  2013-07-11  5:38       ` Dave Hansen
@ 2013-07-11  6:12         ` Joonsoo Kim
  2013-07-11 15:51           ` Dave Hansen
  2013-07-12 16:31           ` Dave Hansen
  0 siblings, 2 replies; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-11  6:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Dave Hansen, Andrew Morton, Mel Gorman, David Rientjes,
	Glauber Costa, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm, linux-kernel

On Wed, Jul 10, 2013 at 10:38:20PM -0700, Dave Hansen wrote:
> On 07/10/2013 06:02 PM, Joonsoo Kim wrote:
> > On Wed, Jul 10, 2013 at 03:52:42PM -0700, Dave Hansen wrote:
> >> On 07/03/2013 01:34 AM, Joonsoo Kim wrote:
> >>> -		if (page)
> >>> +		do {
> >>> +			page = buffered_rmqueue(preferred_zone, zone, order,
> >>> +							gfp_mask, migratetype);
> >>> +			if (!page)
> >>> +				break;
> >>> +
> >>> +			if (!nr_pages) {
> >>> +				count++;
> >>> +				break;
> >>> +			}
> >>> +
> >>> +			pages[count++] = page;
> >>> +			if (count >= *nr_pages)
> >>> +				break;
> >>> +
> >>> +			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
> >>> +			if (!zone_watermark_ok(zone, order, mark,
> >>> +					classzone_idx, alloc_flags))
> >>> +				break;
> >>> +		} while (1);
> >>
> >> I'm really surprised this works as well as it does.  Calling
> >> buffered_rmqueue() a bunch of times enables/disables interrupts a bunch
> >> of times, and mucks with the percpu pages lists a whole bunch.
> >> buffered_rmqueue() is really meant for _single_ pages, not to be called
> >> a bunch of times in a row.
> >>
> >> Why not just do a single rmqueue_bulk() call?
> > 
> > rmqueue_bulk() needs a zone lock. If we allocate not so many pages,
> > for example, 2 or 3 pages, it can have much more overhead rather than
> > allocationg 1 page multiple times. So, IMHO, it is better that
> > multiple pages allocation is supported on top of percpu pages list.
> 
> It makes _zero_ sense to be doing a number of buffered_rmqueue() calls
> that are approaching the size of the percpu pages batches.  If you end
> up doing that, you pay both the overhead in manipulating both the percpu
> page data _and_ the buddy lists.
> 
> You're probably right for small numbers of pages.  But, if we're talking
> about things that are more than, say, 100 pages (isn't the pcp batch
> size clamped to 128 4k pages?) you surely don't want to be doing
> buffered_rmqueue().

Yes, you are right.
Firstly, I thought that I can use this for readahead. On my machine,
readahead reads (maximum) 32 pages in advance if faulted. And batch size
of percpu pages list is close to or larger than 32 pages
on today's machine. So I didn't consider more than 32 pages before.
But to cope with a request for more pages, using rmqueue_bulk() is
a right way. How about using rmqueue_bulk() conditionally?

> 
> I'd also like to see some scalability numbers on this.  How do your
> tests look when all the CPUs on the system are hammering away?

What test do you mean?
Please elaborate on this more.

> > And I think that enables/disables interrupts a bunch of times help
> > to avoid a latency problem. If we disable interrupts until the whole works
> > is finished, interrupts can be handled too lately.
> > free_hot_cold_page_list() already do enables/disalbed interrupts a bunch of
> > times.
> 
> I don't think interrupt latency is going to be a problem on the scale
> we're talking about here.  There are much, much, much worse offenders in
> the kernel.

Hmm, rmqueue_bulk() doesn't stop until all requested pages are allocated.
If we request too many pages (1024 pages or more), interrupt latency can
be a problem.

Thanks!!

> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation
  2013-07-11  6:12         ` Joonsoo Kim
@ 2013-07-11 15:51           ` Dave Hansen
  2013-07-16  0:26             ` Joonsoo Kim
  2013-07-12 16:31           ` Dave Hansen
  1 sibling, 1 reply; 28+ messages in thread
From: Dave Hansen @ 2013-07-11 15:51 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Dave Hansen, Andrew Morton, Mel Gorman, David Rientjes,
	Glauber Costa, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm, linux-kernel

On 07/10/2013 11:12 PM, Joonsoo Kim wrote:
>> > I'd also like to see some scalability numbers on this.  How do your
>> > tests look when all the CPUs on the system are hammering away?
> What test do you mean?
> Please elaborate on this more

Your existing tests looked single-threaded.  That's certainly part of
the problem.  Will your patches have implications for larger systems,
though?  How much do your patches speed up or slow things down if we
have many allocations proceeding on many CPUs in parallel?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation
  2013-07-11  6:12         ` Joonsoo Kim
  2013-07-11 15:51           ` Dave Hansen
@ 2013-07-12 16:31           ` Dave Hansen
  2013-07-16  0:37             ` Joonsoo Kim
  1 sibling, 1 reply; 28+ messages in thread
From: Dave Hansen @ 2013-07-12 16:31 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Dave Hansen, Andrew Morton, Mel Gorman, David Rientjes,
	Glauber Costa, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm, linux-kernel

On 07/10/2013 11:12 PM, Joonsoo Kim wrote:
> On Wed, Jul 10, 2013 at 10:38:20PM -0700, Dave Hansen wrote:
>> You're probably right for small numbers of pages.  But, if we're talking
>> about things that are more than, say, 100 pages (isn't the pcp batch
>> size clamped to 128 4k pages?) you surely don't want to be doing
>> buffered_rmqueue().
> 
> Yes, you are right.
> Firstly, I thought that I can use this for readahead. On my machine,
> readahead reads (maximum) 32 pages in advance if faulted. And batch size
> of percpu pages list is close to or larger than 32 pages
> on today's machine. So I didn't consider more than 32 pages before.
> But to cope with a request for more pages, using rmqueue_bulk() is
> a right way. How about using rmqueue_bulk() conditionally?

How about you test it both ways and see what is faster?

> Hmm, rmqueue_bulk() doesn't stop until all requested pages are allocated.
> If we request too many pages (1024 pages or more), interrupt latency can
> be a problem.

OK, so only call it for the number of pages you believe allows it to
have acceptable interrupt latency.  If you want 200 pages, and you can
only disable interrupts for 100 pages, then just do it in two batches.

The point is that you want to avoid messing with the buffering by the
percpu structures.  They're just overhead in your case.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation
  2013-07-11 15:51           ` Dave Hansen
@ 2013-07-16  0:26             ` Joonsoo Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-16  0:26 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Dave Hansen, Andrew Morton, Mel Gorman, David Rientjes,
	Glauber Costa, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm, linux-kernel

On Thu, Jul 11, 2013 at 08:51:22AM -0700, Dave Hansen wrote:
> On 07/10/2013 11:12 PM, Joonsoo Kim wrote:
> >> > I'd also like to see some scalability numbers on this.  How do your
> >> > tests look when all the CPUs on the system are hammering away?
> > What test do you mean?
> > Please elaborate on this more
> 
> Your existing tests looked single-threaded.  That's certainly part of
> the problem.  Will your patches have implications for larger systems,
> though?  How much do your patches speed up or slow things down if we
> have many allocations proceeding on many CPUs in parallel?

Hello.

Okay, I will do that and attach the result to v2.

Thanks.

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation
  2013-07-12 16:31           ` Dave Hansen
@ 2013-07-16  0:37             ` Joonsoo Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2013-07-16  0:37 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Dave Hansen, Andrew Morton, Mel Gorman, David Rientjes,
	Glauber Costa, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	Hugh Dickins, Minchan Kim, Jiang Liu, linux-mm, linux-kernel

On Fri, Jul 12, 2013 at 09:31:42AM -0700, Dave Hansen wrote:
> On 07/10/2013 11:12 PM, Joonsoo Kim wrote:
> > On Wed, Jul 10, 2013 at 10:38:20PM -0700, Dave Hansen wrote:
> >> You're probably right for small numbers of pages.  But, if we're talking
> >> about things that are more than, say, 100 pages (isn't the pcp batch
> >> size clamped to 128 4k pages?) you surely don't want to be doing
> >> buffered_rmqueue().
> > 
> > Yes, you are right.
> > Firstly, I thought that I can use this for readahead. On my machine,
> > readahead reads (maximum) 32 pages in advance if faulted. And batch size
> > of percpu pages list is close to or larger than 32 pages
> > on today's machine. So I didn't consider more than 32 pages before.
> > But to cope with a request for more pages, using rmqueue_bulk() is
> > a right way. How about using rmqueue_bulk() conditionally?
> 
> How about you test it both ways and see what is faster?

It is not easy to test which one is better, because a difference may be
appeared on certain circumstances only. Do not grab the global lock
as much as possible is preferable approach to me.

> 
> > Hmm, rmqueue_bulk() doesn't stop until all requested pages are allocated.
> > If we request too many pages (1024 pages or more), interrupt latency can
> > be a problem.
> 
> OK, so only call it for the number of pages you believe allows it to
> have acceptable interrupt latency.  If you want 200 pages, and you can
> only disable interrupts for 100 pages, then just do it in two batches.
> 
> The point is that you want to avoid messing with the buffering by the
> percpu structures.  They're just overhead in your case.

Okay.

Thanks.

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-07-16  0:37 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-03  8:34 [RFC PATCH 0/5] Support multiple pages allocation Joonsoo Kim
2013-07-03  8:34 ` [RFC PATCH 1/5] mm, page_alloc: support " Joonsoo Kim
2013-07-03 15:57   ` Christoph Lameter
2013-07-04  4:29     ` Joonsoo Kim
2013-07-10 22:52   ` Dave Hansen
2013-07-11  1:02     ` Joonsoo Kim
2013-07-11  5:38       ` Dave Hansen
2013-07-11  6:12         ` Joonsoo Kim
2013-07-11 15:51           ` Dave Hansen
2013-07-16  0:26             ` Joonsoo Kim
2013-07-12 16:31           ` Dave Hansen
2013-07-16  0:37             ` Joonsoo Kim
2013-07-03  8:34 ` [RFC PATCH 2/5] mm, page_alloc: introduce alloc_pages_exact_node_multiple() Joonsoo Kim
2013-07-03  8:34 ` [RFC PATCH 3/5] radix-tree: introduce radix_tree_[next/prev]_present() Joonsoo Kim
2013-07-03  8:34 ` [RFC PATCH 4/5] readahead: remove end range check Joonsoo Kim
2013-07-03  8:34 ` [RFC PATCH 5/5] readhead: support multiple pages allocation for readahead Joonsoo Kim
2013-07-03 15:28 ` [RFC PATCH 0/5] Support multiple pages allocation Michal Hocko
2013-07-03 15:51   ` Zhang Yanfei
2013-07-03 16:01     ` Zhang Yanfei
2013-07-04  4:24       ` Joonsoo Kim
2013-07-04 10:00         ` Michal Hocko
2013-07-10  0:31           ` Joonsoo Kim
2013-07-10  1:20             ` Zhang Yanfei
2013-07-10  9:56               ` Joonsoo Kim
2013-07-10  9:17             ` Michal Hocko
2013-07-10  9:55               ` Joonsoo Kim
2013-07-10 11:27                 ` Michal Hocko
2013-07-11  1:05                   ` Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).