linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/2] Reduce alloc_contig_range latency
@ 2012-08-14  8:57 Minchan Kim
  2012-08-14  8:57 ` [RFC 1/2] cma: remove __reclaim_pages Minchan Kim
  2012-08-14  8:57 ` [RFC 2/2] cma: support MIGRATE_DISCARD Minchan Kim
  0 siblings, 2 replies; 13+ messages in thread
From: Minchan Kim @ 2012-08-14  8:57 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Mel Gorman, Rik van Riel, Kamezawa Hiroyuki, Andrew Morton,
	linux-kernel, linux-mm, Minchan Kim

Hi All,

I played with CMA's core function alloc_contig_range and
found it's very very slow so I suspect we can use it in
real practice.

I tested it with a bit tweak for working CMA in x86 on qemu.
Test environment is following as.

1. x86_64 machince, 2G RAM, 4 core, movable zone 40M with
   try alloc_contig_range(movable_zone->middle_pfn, movable_zone->middle_pfn + 10M)
   per 5sec until background stress test program is terminated.

2. There is background stress program which can make lots of clean cache page.
   It mimics movie player.

alloc_contig_range's latency unit: usec
before:
min 204000 max 8156000 mean 3109310.34482759 success count 58

after:
min 8000 max 112000 mean 45788.2352941177 success count 85

So this patch reduces 8 sec as worst case, 3 sec as mean case.
I'm off from now on until the day of tomorrow so please understand
if I can't reply instantly.

Minchan Kim (2):
  cma: remove __reclaim_pages
  cma: support MIGRATE_DISCARD

 include/linux/migrate_mode.h |   11 +++++--
 include/linux/mm.h           |    2 +-
 include/linux/mmzone.h       |    9 ------
 mm/compaction.c              |    2 +-
 mm/migrate.c                 |   50 +++++++++++++++++++++++------
 mm/page_alloc.c              |   73 +++++-------------------------------------
 6 files changed, 58 insertions(+), 89 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 1/2] cma: remove __reclaim_pages
  2012-08-14  8:57 [RFC 0/2] Reduce alloc_contig_range latency Minchan Kim
@ 2012-08-14  8:57 ` Minchan Kim
  2012-08-15 18:49   ` Rik van Riel
  2012-08-16 13:58   ` Mel Gorman
  2012-08-14  8:57 ` [RFC 2/2] cma: support MIGRATE_DISCARD Minchan Kim
  1 sibling, 2 replies; 13+ messages in thread
From: Minchan Kim @ 2012-08-14  8:57 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Mel Gorman, Rik van Riel, Kamezawa Hiroyuki, Andrew Morton,
	linux-kernel, linux-mm, Minchan Kim

Now cma reclaims too many pages by __reclaim_pages which says
following as

        * Reclaim enough pages to make sure that contiguous allocation
        * will not starve the system.

Starve? What does it starve the system? The function which allocate
free page for migration target would wake up kswapd and do direct reclaim
if needed during migration so system doesn't starve.

Let remove __reclaim_pages and related function and fields.

I modified split_free_page slightly because I removed __reclaim_pages,
isolate_freepages_range can fail by split_free_page's watermark check.
It's very critical in CMA because it ends up failing alloc_contig_range.

I think we don't need the check in case of CMA because CMA allocates
free pages by alloc_pages, not isolate_freepages_block in migrate_pages
so watermark is already checked in alloc_pages.
If the condition isn't meet, kswapd/direct-reclaim could reclaim pages.
There is no reason to check it in split_free_page.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/mm.h     |    2 +-
 include/linux/mmzone.h |    9 ------
 mm/compaction.c        |    2 +-
 mm/page_alloc.c        |   71 +++++-------------------------------------------
 4 files changed, 9 insertions(+), 75 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0514fe9..fd042ae 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -441,7 +441,7 @@ void put_page(struct page *page);
 void put_pages_list(struct list_head *pages);
 
 void split_page(struct page *page, unsigned int order);
-int split_free_page(struct page *page);
+int split_free_page(struct page *page, bool check_wmark);
 
 /*
  * Compound pages have a destructor function.  Provide a
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2daa54f..ca034a1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -63,10 +63,8 @@ enum {
 
 #ifdef CONFIG_CMA
 #  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
-#  define cma_wmark_pages(zone)	zone->min_cma_pages
 #else
 #  define is_migrate_cma(migratetype) false
-#  define cma_wmark_pages(zone) 0
 #endif
 
 #define for_each_migratetype_order(order, type) \
@@ -376,13 +374,6 @@ struct zone {
 	/* see spanned/present_pages for more description */
 	seqlock_t		span_seqlock;
 #endif
-#ifdef CONFIG_CMA
-	/*
-	 * CMA needs to increase watermark levels during the allocation
-	 * process to make sure that the system is not starved.
-	 */
-	unsigned long		min_cma_pages;
-#endif
 	struct free_area	free_area[MAX_ORDER];
 
 #ifndef CONFIG_SPARSEMEM
diff --git a/mm/compaction.c b/mm/compaction.c
index e78cb96..8afa6dc 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -85,7 +85,7 @@ static unsigned long isolate_freepages_block(unsigned long blockpfn,
 		}
 
 		/* Found a free page, break it into order-0 pages */
-		isolated = split_free_page(page);
+		isolated = split_free_page(page, !strict);
 		if (!isolated && strict)
 			return 0;
 		total_isolated += isolated;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cefac39..d8540eb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1388,7 +1388,7 @@ void split_page(struct page *page, unsigned int order)
  * Note: this is probably too low level an operation for use in drivers.
  * Please consult with lkml before using this in your driver.
  */
-int split_free_page(struct page *page)
+int split_free_page(struct page *page, bool check_wmark)
 {
 	unsigned int order;
 	unsigned long watermark;
@@ -1399,10 +1399,12 @@ int split_free_page(struct page *page)
 	zone = page_zone(page);
 	order = page_order(page);
 
-	/* Obey watermarks as if the page was being allocated */
-	watermark = low_wmark_pages(zone) + (1 << order);
-	if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
-		return 0;
+	if (check_wmark) {
+		/* Obey watermarks as if the page was being allocated */
+		watermark = low_wmark_pages(zone) + (1 << order);
+		if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
+			return 0;
+	}
 
 	/* Remove page from free list */
 	list_del(&page->lru);
@@ -5116,10 +5118,6 @@ static void __setup_per_zone_wmarks(void)
 		zone->watermark[WMARK_LOW]  = min_wmark_pages(zone) + (tmp >> 2);
 		zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp >> 1);
 
-		zone->watermark[WMARK_MIN] += cma_wmark_pages(zone);
-		zone->watermark[WMARK_LOW] += cma_wmark_pages(zone);
-		zone->watermark[WMARK_HIGH] += cma_wmark_pages(zone);
-
 		setup_zone_migrate_reserve(zone);
 		spin_unlock_irqrestore(&zone->lock, flags);
 	}
@@ -5671,54 +5669,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 	return ret > 0 ? 0 : ret;
 }
 
-/*
- * Update zone's cma pages counter used for watermark level calculation.
- */
-static inline void __update_cma_watermarks(struct zone *zone, int count)
-{
-	unsigned long flags;
-	spin_lock_irqsave(&zone->lock, flags);
-	zone->min_cma_pages += count;
-	spin_unlock_irqrestore(&zone->lock, flags);
-	setup_per_zone_wmarks();
-}
-
-/*
- * Trigger memory pressure bump to reclaim some pages in order to be able to
- * allocate 'count' pages in single page units. Does similar work as
- *__alloc_pages_slowpath() function.
- */
-static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count)
-{
-	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
-	struct zonelist *zonelist = node_zonelist(0, gfp_mask);
-	int did_some_progress = 0;
-	int order = 1;
-
-	/*
-	 * Increase level of watermarks to force kswapd do his job
-	 * to stabilise at new watermark level.
-	 */
-	__update_cma_watermarks(zone, count);
-
-	/* Obey watermarks as if the page was being allocated */
-	while (!zone_watermark_ok(zone, 0, low_wmark_pages(zone), 0, 0)) {
-		wake_all_kswapd(order, zonelist, high_zoneidx, zone_idx(zone));
-
-		did_some_progress = __perform_reclaim(gfp_mask, order, zonelist,
-						      NULL);
-		if (!did_some_progress) {
-			/* Exhausted what can be done so it's blamo time */
-			out_of_memory(zonelist, gfp_mask, order, NULL, false);
-		}
-	}
-
-	/* Restore original watermark levels. */
-	__update_cma_watermarks(zone, -count);
-
-	return count;
-}
-
 /**
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
@@ -5742,7 +5692,6 @@ static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count)
 int alloc_contig_range(unsigned long start, unsigned long end,
 		       unsigned migratetype)
 {
-	struct zone *zone = page_zone(pfn_to_page(start));
 	unsigned long outer_start, outer_end;
 	int ret = 0, order;
 
@@ -5817,12 +5766,6 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		goto done;
 	}
 
-	/*
-	 * Reclaim enough pages to make sure that contiguous allocation
-	 * will not starve the system.
-	 */
-	__reclaim_pages(zone, GFP_HIGHUSER_MOVABLE, end-start);
-
 	/* Grab isolated pages from freelists. */
 	outer_end = isolate_freepages_range(outer_start, end);
 	if (!outer_end) {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 2/2] cma: support MIGRATE_DISCARD
  2012-08-14  8:57 [RFC 0/2] Reduce alloc_contig_range latency Minchan Kim
  2012-08-14  8:57 ` [RFC 1/2] cma: remove __reclaim_pages Minchan Kim
@ 2012-08-14  8:57 ` Minchan Kim
  2012-08-14 14:19   ` Michal Nazarewicz
  2012-08-15 18:58   ` Rik van Riel
  1 sibling, 2 replies; 13+ messages in thread
From: Minchan Kim @ 2012-08-14  8:57 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Mel Gorman, Rik van Riel, Kamezawa Hiroyuki, Andrew Morton,
	linux-kernel, linux-mm, Minchan Kim

This patch introudes MIGRATE_DISCARD mode in migration.
It drop clean cache pages instead of migration so that
migration latency could be reduced. Of course, it could
evict code pages but latency of big contiguous memory
is more important than some background application's slow down
in mobile embedded enviroment.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/migrate_mode.h |   11 +++++++---
 mm/migrate.c                 |   50 +++++++++++++++++++++++++++++++++---------
 mm/page_alloc.c              |    2 +-
 3 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index ebf3d89..04ca19c 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -6,11 +6,16 @@
  *	on most operations but not ->writepage as the potential stall time
  *	is too significant
  * MIGRATE_SYNC will block when migrating pages
+ * MIGRTATE_DISCARD will discard clean cache page instead of migration
+ *
+ * MIGRATE_ASYNC, MIGRATE_SYNC_LIGHT, MIGRATE_SYNC shouldn't be used
+ * together as OR flag.
  */
 enum migrate_mode {
-	MIGRATE_ASYNC,
-	MIGRATE_SYNC_LIGHT,
-	MIGRATE_SYNC,
+	MIGRATE_ASYNC = 1 << 0,
+	MIGRATE_SYNC_LIGHT = 1 << 1,
+	MIGRATE_SYNC = 1 << 2,
+	MIGRATE_DISCARD = 1 << 3,
 };
 
 #endif		/* MIGRATE_MODE_H_INCLUDED */
diff --git a/mm/migrate.c b/mm/migrate.c
index 77ed2d7..8119a59 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -225,7 +225,7 @@ static bool buffer_migrate_lock_buffers(struct buffer_head *head,
 	struct buffer_head *bh = head;
 
 	/* Simple case, sync compaction */
-	if (mode != MIGRATE_ASYNC) {
+	if (!(mode & MIGRATE_ASYNC)) {
 		do {
 			get_bh(bh);
 			lock_buffer(bh);
@@ -313,7 +313,7 @@ static int migrate_page_move_mapping(struct address_space *mapping,
 	 * the mapping back due to an elevated page count, we would have to
 	 * block waiting on other references to be dropped.
 	 */
-	if (mode == MIGRATE_ASYNC && head &&
+	if (mode & MIGRATE_ASYNC && head &&
 			!buffer_migrate_lock_buffers(head, mode)) {
 		page_unfreeze_refs(page, expected_count);
 		spin_unlock_irq(&mapping->tree_lock);
@@ -521,7 +521,7 @@ int buffer_migrate_page(struct address_space *mapping,
 	 * with an IRQ-safe spinlock held. In the sync case, the buffers
 	 * need to be locked now
 	 */
-	if (mode != MIGRATE_ASYNC)
+	if (!(mode & MIGRATE_ASYNC))
 		BUG_ON(!buffer_migrate_lock_buffers(head, mode));
 
 	ClearPagePrivate(page);
@@ -603,7 +603,7 @@ static int fallback_migrate_page(struct address_space *mapping,
 {
 	if (PageDirty(page)) {
 		/* Only writeback pages in full synchronous migration */
-		if (mode != MIGRATE_SYNC)
+		if (!(mode & MIGRATE_SYNC))
 			return -EBUSY;
 		return writeout(mapping, page);
 	}
@@ -685,9 +685,12 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	int remap_swapcache = 1;
 	struct mem_cgroup *mem;
 	struct anon_vma *anon_vma = NULL;
+	enum ttu_flags ttu_flags;
+	bool discard_mode = false;
+	bool file = false;
 
 	if (!trylock_page(page)) {
-		if (!force || mode == MIGRATE_ASYNC)
+		if (!force || mode & MIGRATE_ASYNC)
 			goto out;
 
 		/*
@@ -733,7 +736,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 		 * the retry loop is too short and in the sync-light case,
 		 * the overhead of stalling is too much
 		 */
-		if (mode != MIGRATE_SYNC) {
+		if (!(mode & MIGRATE_SYNC)) {
 			rc = -EBUSY;
 			goto uncharge;
 		}
@@ -799,12 +802,39 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 		goto skip_unmap;
 	}
 
+	file = page_is_file_cache(page);
+	ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS;
+
+	if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page))
+		ttu_flags |= TTU_MIGRATION;
+	else
+		discard_mode = true;
+
 	/* Establish migration ptes or remove ptes */
-	try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
+	try_to_unmap(page, ttu_flags);
 
 skip_unmap:
-	if (!page_mapped(page))
-		rc = move_to_new_page(newpage, page, remap_swapcache, mode);
+	if (!page_mapped(page)) {
+		if (!discard_mode)
+			rc = move_to_new_page(newpage, page, remap_swapcache, mode);
+		else {
+			struct address_space *mapping;
+			mapping = page_mapping(page);
+
+			if (page_has_private(page)) {
+				if (!try_to_release_page(page, GFP_KERNEL)) {
+					rc = -EAGAIN;
+					goto uncharge;
+				}
+			}
+
+			if (remove_mapping(mapping, page))
+				rc = 0;
+			else
+				rc = -EAGAIN;
+			goto uncharge;
+		}
+	}
 
 	if (rc && remap_swapcache)
 		remove_migration_ptes(page, page);
@@ -907,7 +937,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	rc = -EAGAIN;
 
 	if (!trylock_page(hpage)) {
-		if (!force || mode != MIGRATE_SYNC)
+		if (!force || !(mode & MIGRATE_SYNC))
 			goto out;
 		lock_page(hpage);
 	}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d8540eb..58ea96d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5662,7 +5662,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 
 		ret = migrate_pages(&cc.migratepages,
 				    __alloc_contig_migrate_alloc,
-				    0, false, MIGRATE_SYNC);
+				    0, false, MIGRATE_SYNC|MIGRATE_DISCARD);
 	}
 
 	putback_lru_pages(&cc.migratepages);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC 2/2] cma: support MIGRATE_DISCARD
  2012-08-14  8:57 ` [RFC 2/2] cma: support MIGRATE_DISCARD Minchan Kim
@ 2012-08-14 14:19   ` Michal Nazarewicz
  2012-08-15 23:20     ` Minchan Kim
  2012-08-15 18:58   ` Rik van Riel
  1 sibling, 1 reply; 13+ messages in thread
From: Michal Nazarewicz @ 2012-08-14 14:19 UTC (permalink / raw)
  To: Minchan Kim, Marek Szyprowski
  Cc: Mel Gorman, Rik van Riel, Kamezawa Hiroyuki, Andrew Morton,
	linux-kernel, linux-mm, Minchan Kim

[-- Attachment #1: Type: text/plain, Size: 2623 bytes --]

Minchan Kim <minchan@kernel.org> writes:
> This patch introudes MIGRATE_DISCARD mode in migration.
> It drop clean cache pages instead of migration so that
> migration latency could be reduced. Of course, it could
> evict code pages but latency of big contiguous memory
> is more important than some background application's slow down
> in mobile embedded enviroment.
>
> Signed-off-by: Minchan Kim <minchan@kernel.org>

This looks good to me.

> ---
>  include/linux/migrate_mode.h |   11 +++++++---
>  mm/migrate.c                 |   50 +++++++++++++++++++++++++++++++++---------
>  mm/page_alloc.c              |    2 +-
>  3 files changed, 49 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
> index ebf3d89..04ca19c 100644
> --- a/include/linux/migrate_mode.h
> +++ b/include/linux/migrate_mode.h
> @@ -6,11 +6,16 @@
>   *	on most operations but not ->writepage as the potential stall time
>   *	is too significant
>   * MIGRATE_SYNC will block when migrating pages
> + * MIGRTATE_DISCARD will discard clean cache page instead of migration
> + *
> + * MIGRATE_ASYNC, MIGRATE_SYNC_LIGHT, MIGRATE_SYNC shouldn't be used
> + * together as OR flag.
>   */
>  enum migrate_mode {
> -	MIGRATE_ASYNC,
> -	MIGRATE_SYNC_LIGHT,
> -	MIGRATE_SYNC,
> +	MIGRATE_ASYNC = 1 << 0,
> +	MIGRATE_SYNC_LIGHT = 1 << 1,
> +	MIGRATE_SYNC = 1 << 2,
> +	MIGRATE_DISCARD = 1 << 3,
>  };

Since CMA is the only user of MIGRATE_DISCARD it may be worth it to
guard it inside an #ifdef, eg:

#ifdef CONFIG_CMA
	MIGRATE_DISCARD = 1 << 3,
#define is_migrate_discard(mode) (((mode) & MIGRATE_DISCARD) == MIGRATE_DISCARD)
#endif

  
>  #endif		/* MIGRATE_MODE_H_INCLUDED */
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 77ed2d7..8119a59 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -685,9 +685,12 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>  	int remap_swapcache = 1;
>  	struct mem_cgroup *mem;
>  	struct anon_vma *anon_vma = NULL;
> +	enum ttu_flags ttu_flags;
> +	bool discard_mode = false;
> +	bool file = false;
>  
>  	if (!trylock_page(page)) {
> -		if (!force || mode == MIGRATE_ASYNC)
> +		if (!force || mode & MIGRATE_ASYNC)

+		if (!force || (mode & MIGRATE_ASYNC))

>  			goto out;
>  
>  		/*


-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

[-- Attachment #2.1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2.2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 1/2] cma: remove __reclaim_pages
  2012-08-14  8:57 ` [RFC 1/2] cma: remove __reclaim_pages Minchan Kim
@ 2012-08-15 18:49   ` Rik van Riel
  2012-08-16 13:58   ` Mel Gorman
  1 sibling, 0 replies; 13+ messages in thread
From: Rik van Riel @ 2012-08-15 18:49 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Marek Szyprowski, Mel Gorman, Kamezawa Hiroyuki, Andrew Morton,
	linux-kernel, linux-mm

On 08/14/2012 04:57 AM, Minchan Kim wrote:
> Now cma reclaims too many pages by __reclaim_pages which says
> following as
>
>          * Reclaim enough pages to make sure that contiguous allocation
>          * will not starve the system.
>
> Starve? What does it starve the system? The function which allocate
> free page for migration target would wake up kswapd and do direct reclaim
> if needed during migration so system doesn't starve.
>
> Let remove __reclaim_pages and related function and fields.

Fair enough.

> Signed-off-by: Minchan Kim <minchan@kernel.org>

Acked-by: Rik van Riel <riel@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 2/2] cma: support MIGRATE_DISCARD
  2012-08-14  8:57 ` [RFC 2/2] cma: support MIGRATE_DISCARD Minchan Kim
  2012-08-14 14:19   ` Michal Nazarewicz
@ 2012-08-15 18:58   ` Rik van Riel
  2012-08-15 23:33     ` Minchan Kim
  1 sibling, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2012-08-15 18:58 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Marek Szyprowski, Mel Gorman, Kamezawa Hiroyuki, Andrew Morton,
	linux-kernel, linux-mm

On 08/14/2012 04:57 AM, Minchan Kim wrote:
> This patch introudes MIGRATE_DISCARD mode in migration.
> It drop clean cache pages instead of migration so that
> migration latency could be reduced. Of course, it could
> evict code pages but latency of big contiguous memory
> is more important than some background application's slow down
> in mobile embedded enviroment.

Would it be an idea to only drop clean UNMAPPED
page cache pages?

> Signed-off-by: Minchan Kim <minchan@kernel.org>

> @@ -799,12 +802,39 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>   		goto skip_unmap;
>   	}
>
> +	file = page_is_file_cache(page);
> +	ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS;
> +
> +	if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page))
> +		ttu_flags |= TTU_MIGRATION;
> +	else
> +		discard_mode = true;
> +
>   	/* Establish migration ptes or remove ptes */
> -	try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
> +	try_to_unmap(page, ttu_flags);

This bit looks wrong, because you end up ignoring
mlock and then discarding the page.

Only dropping clean page cache pages that are not
mapped would avoid that problem, without introducing
much complexity in the code.

That would turn the test above into:

	if (!page_mapped(page))
		discard_mode = true;

>   skip_unmap:
> -	if (!page_mapped(page))
> -		rc = move_to_new_page(newpage, page, remap_swapcache, mode);
> +	if (!page_mapped(page)) {
> +		if (!discard_mode)
> +			rc = move_to_new_page(newpage, page, remap_swapcache, mode);
> +		else {
> +			struct address_space *mapping;
> +			mapping = page_mapping(page);
> +
> +			if (page_has_private(page)) {
> +				if (!try_to_release_page(page, GFP_KERNEL)) {
> +					rc = -EAGAIN;
> +					goto uncharge;
> +				}
> +			}
> +
> +			if (remove_mapping(mapping, page))
> +				rc = 0;
> +			else
> +				rc = -EAGAIN;
> +			goto uncharge;
> +		}
> +	}

This big piece of code could probably be split out
into its own function.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 2/2] cma: support MIGRATE_DISCARD
  2012-08-14 14:19   ` Michal Nazarewicz
@ 2012-08-15 23:20     ` Minchan Kim
  2012-08-16 13:17       ` Michal Nazarewicz
  0 siblings, 1 reply; 13+ messages in thread
From: Minchan Kim @ 2012-08-15 23:20 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Marek Szyprowski, Mel Gorman, Rik van Riel, Kamezawa Hiroyuki,
	Andrew Morton, linux-kernel, linux-mm

Hi Michal,

On Tue, Aug 14, 2012 at 04:19:55PM +0200, Michal Nazarewicz wrote:
> Minchan Kim <minchan@kernel.org> writes:
> > This patch introudes MIGRATE_DISCARD mode in migration.
> > It drop clean cache pages instead of migration so that
> > migration latency could be reduced. Of course, it could
> > evict code pages but latency of big contiguous memory
> > is more important than some background application's slow down
> > in mobile embedded enviroment.
> >
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> This looks good to me.
> 
> > ---
> >  include/linux/migrate_mode.h |   11 +++++++---
> >  mm/migrate.c                 |   50 +++++++++++++++++++++++++++++++++---------
> >  mm/page_alloc.c              |    2 +-
> >  3 files changed, 49 insertions(+), 14 deletions(-)
> >
> > diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
> > index ebf3d89..04ca19c 100644
> > --- a/include/linux/migrate_mode.h
> > +++ b/include/linux/migrate_mode.h
> > @@ -6,11 +6,16 @@
> >   *	on most operations but not ->writepage as the potential stall time
> >   *	is too significant
> >   * MIGRATE_SYNC will block when migrating pages
> > + * MIGRTATE_DISCARD will discard clean cache page instead of migration
> > + *
> > + * MIGRATE_ASYNC, MIGRATE_SYNC_LIGHT, MIGRATE_SYNC shouldn't be used
> > + * together as OR flag.
> >   */
> >  enum migrate_mode {
> > -	MIGRATE_ASYNC,
> > -	MIGRATE_SYNC_LIGHT,
> > -	MIGRATE_SYNC,
> > +	MIGRATE_ASYNC = 1 << 0,
> > +	MIGRATE_SYNC_LIGHT = 1 << 1,
> > +	MIGRATE_SYNC = 1 << 2,
> > +	MIGRATE_DISCARD = 1 << 3,
> >  };
> 
> Since CMA is the only user of MIGRATE_DISCARD it may be worth it to
> guard it inside an #ifdef, eg:
> 
> #ifdef CONFIG_CMA
> 	MIGRATE_DISCARD = 1 << 3,
> #define is_migrate_discard(mode) (((mode) & MIGRATE_DISCARD) == MIGRATE_DISCARD)

The mode bit can be used with other bits like MIGRATE_SYNC|MIGRATE_DISCARD.
So it is correct that (mode & MIGRATE_DISCARD).

Anyway, I don't want to fold it into only CMA because I think we can
have a pontential users in mm.
For example, memory-hotplug case. No enough free memory in the system
but lots of page cache page as a migration source, then we can remove
page cache page instead of migration and it might be better than failing
memory-hotremove.

In summary, I want to open it for potential usecases in future if anyone
doesn't oppose strongly.

> #endif
> 
>   
> >  #endif		/* MIGRATE_MODE_H_INCLUDED */
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 77ed2d7..8119a59 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -685,9 +685,12 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> >  	int remap_swapcache = 1;
> >  	struct mem_cgroup *mem;
> >  	struct anon_vma *anon_vma = NULL;
> > +	enum ttu_flags ttu_flags;
> > +	bool discard_mode = false;
> > +	bool file = false;
> >  
> >  	if (!trylock_page(page)) {
> > -		if (!force || mode == MIGRATE_ASYNC)
> > +		if (!force || mode & MIGRATE_ASYNC)

It's not wrong technically but for readability, NP.

> 
> +		if (!force || (mode & MIGRATE_ASYNC))
> 
> >  			goto out;
> >  
> >  		/*
> 
> 
> -- 
> Best regards,                                         _     _
> .o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
> ..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
> ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--





-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 2/2] cma: support MIGRATE_DISCARD
  2012-08-15 18:58   ` Rik van Riel
@ 2012-08-15 23:33     ` Minchan Kim
  2012-08-16  0:15       ` Minchan Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Minchan Kim @ 2012-08-15 23:33 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Marek Szyprowski, Mel Gorman, Kamezawa Hiroyuki, Andrew Morton,
	linux-kernel, linux-mm

Hi Rik,

On Wed, Aug 15, 2012 at 02:58:01PM -0400, Rik van Riel wrote:
> On 08/14/2012 04:57 AM, Minchan Kim wrote:
> >This patch introudes MIGRATE_DISCARD mode in migration.
> >It drop clean cache pages instead of migration so that
> >migration latency could be reduced. Of course, it could
> >evict code pages but latency of big contiguous memory
> >is more important than some background application's slow down
> >in mobile embedded enviroment.
> 
> Would it be an idea to only drop clean UNMAPPED
> page cache pages?

Firstly I thougt about that but I chose more agressive thing.
Namely, even drop mapped page cache.
Because it can reduce latency more(ex, memcpy + remapping cost
during migration) and it could not trivial if migration range is big.

> 
> >Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> >@@ -799,12 +802,39 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> >  		goto skip_unmap;
> >  	}
> >
> >+	file = page_is_file_cache(page);
> >+	ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS;
> >+
> >+	if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page))
> >+		ttu_flags |= TTU_MIGRATION;
> >+	else
> >+		discard_mode = true;
> >+
> >  	/* Establish migration ptes or remove ptes */
> >-	try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
> >+	try_to_unmap(page, ttu_flags);
> 
> This bit looks wrong, because you end up ignoring
> mlock and then discarding the page.

Argh, Thanks!
I will fix it in next spin.

> 
> Only dropping clean page cache pages that are not
> mapped would avoid that problem, without introducing
> much complexity in the code.

Hmm, I don't think it makes code much complex.
How about this?

diff --git a/mm/rmap.c b/mm/rmap.c
index 0f3b7cd..0909d79 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1223,7 +1223,8 @@ out:
  * repeatedly from try_to_unmap_ksm, try_to_unmap_anon or try_to_unmap_file.
  */
 int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
-                    unsigned long address, enum ttu_flags flags)
+                    unsigned long address, enum ttu_flags flags,
+                    unsigned long *vm_flags)
 {
        struct mm_struct *mm = vma->vm_mm;
        pte_t *pte;
@@ -1235,6 +1236,7 @@ int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
        if (!pte)
                goto out;
 
+       vm_flags |= vma->vm_flags;
        /*
         * If the page is mlock()d, we cannot swap it out.
         * If it's recently referenced (perhaps page_referenced
@@ -1652,7 +1654,7 @@ out:
  * SWAP_FAIL   - the page is unswappable
  * SWAP_MLOCK  - page is mlocked.
  */
-int try_to_unmap(struct page *page, enum ttu_flags flags)
+int try_to_unmap(struct page *page, enum ttu_flags flags, unsigned long *vm_flags)
 {
        int ret;

<snip> 

+       file = page_is_file_cache(page);
+       ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS;
+
+       if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page) ||
+               vm_flags & VM_LOCKED)
+               ttu_flags |= TTU_MIGRATION;
+       else
+               discard_mode = true;
+


> 
> That would turn the test above into:
> 
> 	if (!page_mapped(page))
> 		discard_mode = true;
> 
> >  skip_unmap:
> >-	if (!page_mapped(page))
> >-		rc = move_to_new_page(newpage, page, remap_swapcache, mode);
> >+	if (!page_mapped(page)) {
> >+		if (!discard_mode)
> >+			rc = move_to_new_page(newpage, page, remap_swapcache, mode);
> >+		else {
> >+			struct address_space *mapping;
> >+			mapping = page_mapping(page);
> >+
> >+			if (page_has_private(page)) {
> >+				if (!try_to_release_page(page, GFP_KERNEL)) {
> >+					rc = -EAGAIN;
> >+					goto uncharge;
> >+				}
> >+			}
> >+
> >+			if (remove_mapping(mapping, page))
> >+				rc = 0;
> >+			else
> >+				rc = -EAGAIN;
> >+			goto uncharge;
> >+		}
> >+	}
> 
> This big piece of code could probably be split out
> into its own function.
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC 2/2] cma: support MIGRATE_DISCARD
  2012-08-15 23:33     ` Minchan Kim
@ 2012-08-16  0:15       ` Minchan Kim
  0 siblings, 0 replies; 13+ messages in thread
From: Minchan Kim @ 2012-08-16  0:15 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Marek Szyprowski, Mel Gorman, Kamezawa Hiroyuki, Andrew Morton,
	linux-kernel, linux-mm

On Thu, Aug 16, 2012 at 08:33:23AM +0900, Minchan Kim wrote:
> Hi Rik,
> 
> On Wed, Aug 15, 2012 at 02:58:01PM -0400, Rik van Riel wrote:
> > On 08/14/2012 04:57 AM, Minchan Kim wrote:
> > >This patch introudes MIGRATE_DISCARD mode in migration.
> > >It drop clean cache pages instead of migration so that
> > >migration latency could be reduced. Of course, it could
> > >evict code pages but latency of big contiguous memory
> > >is more important than some background application's slow down
> > >in mobile embedded enviroment.
> > 
> > Would it be an idea to only drop clean UNMAPPED
> > page cache pages?
> 
> Firstly I thougt about that but I chose more agressive thing.
> Namely, even drop mapped page cache.
> Because it can reduce latency more(ex, memcpy + remapping cost
> during migration) and it could not trivial if migration range is big.
> 
> > 
> > >Signed-off-by: Minchan Kim <minchan@kernel.org>
> > 
> > >@@ -799,12 +802,39 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> > >  		goto skip_unmap;
> > >  	}
> > >
> > >+	file = page_is_file_cache(page);
> > >+	ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS;
> > >+
> > >+	if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page))
> > >+		ttu_flags |= TTU_MIGRATION;
> > >+	else
> > >+		discard_mode = true;
> > >+
> > >  	/* Establish migration ptes or remove ptes */
> > >-	try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
> > >+	try_to_unmap(page, ttu_flags);
> > 
> > This bit looks wrong, because you end up ignoring
> > mlock and then discarding the page.
> 
> Argh, Thanks!
> I will fix it in next spin.
> 
> > 
> > Only dropping clean page cache pages that are not
> > mapped would avoid that problem, without introducing
> > much complexity in the code.
> 
> Hmm, I don't think it makes code much complex.
> How about this?
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 0f3b7cd..0909d79 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1223,7 +1223,8 @@ out:
>   * repeatedly from try_to_unmap_ksm, try_to_unmap_anon or try_to_unmap_file.
>   */
>  int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> -                    unsigned long address, enum ttu_flags flags)
> +                    unsigned long address, enum ttu_flags flags,
> +                    unsigned long *vm_flags)
>  {
>         struct mm_struct *mm = vma->vm_mm;
>         pte_t *pte;
> @@ -1235,6 +1236,7 @@ int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>         if (!pte)
>                 goto out;
>  
> +       vm_flags |= vma->vm_flags;
>         /*
>          * If the page is mlock()d, we cannot swap it out.
>          * If it's recently referenced (perhaps page_referenced
> @@ -1652,7 +1654,7 @@ out:
>   * SWAP_FAIL   - the page is unswappable
>   * SWAP_MLOCK  - page is mlocked.
>   */
> -int try_to_unmap(struct page *page, enum ttu_flags flags)
> +int try_to_unmap(struct page *page, enum ttu_flags flags, unsigned long *vm_flags)
>  {
>         int ret;
> 
> <snip> 
> 
> +       file = page_is_file_cache(page);
> +       ttu_flags = TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS;
> +
> +       if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page) ||
> +               vm_flags & VM_LOCKED)

We do try_to_unmap after this piece so we can't get the information in advance. :(
I don't have better idea which doesn't have a drawback so I will accept your idea.
Thanks, Rik.

> +               ttu_flags |= TTU_MIGRATION;
> +       else
> +               discard_mode = true;
> +
> 
> 
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 2/2] cma: support MIGRATE_DISCARD
  2012-08-15 23:20     ` Minchan Kim
@ 2012-08-16 13:17       ` Michal Nazarewicz
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Nazarewicz @ 2012-08-16 13:17 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Marek Szyprowski, Mel Gorman, Rik van Riel, Kamezawa Hiroyuki,
	Andrew Morton, linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 829 bytes --]

> On Tue, Aug 14, 2012 at 04:19:55PM +0200, Michal Nazarewicz wrote:
>> Since CMA is the only user of MIGRATE_DISCARD it may be worth it to
>> guard it inside an #ifdef, eg:

Minchan Kim <minchan@kernel.org> writes:
> In summary, I want to open it for potential usecases in future if anyone
> doesn't oppose strongly.

Fair enough.

>>>  	if (!trylock_page(page)) {
>>> -		if (!force || mode == MIGRATE_ASYNC)
>>> +		if (!force || mode & MIGRATE_ASYNC)

> It's not wrong technically but for readability, NP.

Yep, that was my point, thanks. :)

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

[-- Attachment #2.1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2.2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 1/2] cma: remove __reclaim_pages
  2012-08-14  8:57 ` [RFC 1/2] cma: remove __reclaim_pages Minchan Kim
  2012-08-15 18:49   ` Rik van Riel
@ 2012-08-16 13:58   ` Mel Gorman
  2012-08-17  1:05     ` Minchan Kim
  1 sibling, 1 reply; 13+ messages in thread
From: Mel Gorman @ 2012-08-16 13:58 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Marek Szyprowski, Rik van Riel, Kamezawa Hiroyuki, Andrew Morton,
	linux-kernel, linux-mm

On Tue, Aug 14, 2012 at 05:57:06PM +0900, Minchan Kim wrote:
> Now cma reclaims too many pages by __reclaim_pages which says
> following as
> 
>         * Reclaim enough pages to make sure that contiguous allocation
>         * will not starve the system.
> 
> Starve? What does it starve the system? The function which allocate
> free page for migration target would wake up kswapd and do direct reclaim
> if needed during migration so system doesn't starve.
> 

I thought this patch was overkill at the time it was introduced but
didn't have a concrete reason to reject it when I commented on it
https://lkml.org/lkml/2012/1/30/136 . Marek did want this and followed
up with "contiguous allocations should have higher priority than others"
which I took to mean that he was also ok with excessive reclaim.

> Let remove __reclaim_pages and related function and fields.
> 

That should be one patch and I don't object to it being removed as such
but it's Marek's call.

> I modified split_free_page slightly because I removed __reclaim_pages,
> isolate_freepages_range can fail by split_free_page's watermark check.
> It's very critical in CMA because it ends up failing alloc_contig_range.
> 

This is a big change and should have been in a patch on its
own. split_free_page checks watermarks because if the watermarks are
not obeyed a zone can become fully allocated. This can cause a system to
livelock under certain circumstances if a page cannot be allocated and a
free page is required before other pages can be freed.

> I think we don't need the check in case of CMA because CMA allocates
> free pages by alloc_pages, not isolate_freepages_block in migrate_pages
> so watermark is already checked in alloc_pages.

It uses alloc_pages when migrating pages out of the CMA area but note
that it uses isolate_freepages_block when allocating the CMA buffer when
alloc_contig_range calls isolate_freepages_range

isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
{
	for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) {
               isolated = isolate_freepages_block(pfn, block_end_pfn,
                                                   &freelist, true);
	}
	map_pages(&freelist);
}

so the actual CMA allocation itself is not using alloc_pages. By removing
the watermark check you allow the CMA to breach watermarks and puts the
system at risk of livelock.

I'm not keen on the split_free_page() change at all.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 1/2] cma: remove __reclaim_pages
  2012-08-16 13:58   ` Mel Gorman
@ 2012-08-17  1:05     ` Minchan Kim
  2012-08-17 14:48       ` Marek Szyprowski
  0 siblings, 1 reply; 13+ messages in thread
From: Minchan Kim @ 2012-08-17  1:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Marek Szyprowski, Rik van Riel, Kamezawa Hiroyuki, Andrew Morton,
	linux-kernel, linux-mm

Hi Mel,

On Thu, Aug 16, 2012 at 02:58:18PM +0100, Mel Gorman wrote:
> On Tue, Aug 14, 2012 at 05:57:06PM +0900, Minchan Kim wrote:
> > Now cma reclaims too many pages by __reclaim_pages which says
> > following as
> > 
> >         * Reclaim enough pages to make sure that contiguous allocation
> >         * will not starve the system.
> > 
> > Starve? What does it starve the system? The function which allocate
> > free page for migration target would wake up kswapd and do direct reclaim
> > if needed during migration so system doesn't starve.
> > 
> 
> I thought this patch was overkill at the time it was introduced but
> didn't have a concrete reason to reject it when I commented on it
> https://lkml.org/lkml/2012/1/30/136 . Marek did want this and followed
> up with "contiguous allocations should have higher priority than others"
> which I took to mean that he was also ok with excessive reclaim.

I think OOM kill to background applications is more appropriate than
big latency of foreground(ex, Camera app) application in your mobile phone.
In other words, excessive reclaim is *really* bad which elapsed 8sec
in my test as worst case. :(

> 
> > Let remove __reclaim_pages and related function and fields.
> > 
> 
> That should be one patch and I don't object to it being removed as such
> but it's Marek's call.

Marek. Any thought?

> 
> > I modified split_free_page slightly because I removed __reclaim_pages,
> > isolate_freepages_range can fail by split_free_page's watermark check.
> > It's very critical in CMA because it ends up failing alloc_contig_range.
> > 
> 
> This is a big change and should have been in a patch on its
> own. split_free_page checks watermarks because if the watermarks are
> not obeyed a zone can become fully allocated. This can cause a system to
> livelock under certain circumstances if a page cannot be allocated and a
> free page is required before other pages can be freed.
> 
> > I think we don't need the check in case of CMA because CMA allocates
> > free pages by alloc_pages, not isolate_freepages_block in migrate_pages
> > so watermark is already checked in alloc_pages.
> 
> It uses alloc_pages when migrating pages out of the CMA area but note
> that it uses isolate_freepages_block when allocating the CMA buffer when
> alloc_contig_range calls isolate_freepages_range
> 
> isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
> {
> 	for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) {
>                isolated = isolate_freepages_block(pfn, block_end_pfn,
>                                                    &freelist, true);
> 	}
> 	map_pages(&freelist);
> }
> 
> so the actual CMA allocation itself is not using alloc_pages. By removing
> the watermark check you allow the CMA to breach watermarks and puts the
> system at risk of livelock.

Fair enough. I will look into that.
Thanks, Mel.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFC 1/2] cma: remove __reclaim_pages
  2012-08-17  1:05     ` Minchan Kim
@ 2012-08-17 14:48       ` Marek Szyprowski
  0 siblings, 0 replies; 13+ messages in thread
From: Marek Szyprowski @ 2012-08-17 14:48 UTC (permalink / raw)
  To: 'Minchan Kim', 'Mel Gorman'
  Cc: 'Rik van Riel', 'Kamezawa Hiroyuki',
	'Andrew Morton',
	linux-kernel, linux-mm

Hi Minchan,

On Friday, August 17, 2012 3:06 AM Minchan Kim wrote:

> On Thu, Aug 16, 2012 at 02:58:18PM +0100, Mel Gorman wrote:
> > On Tue, Aug 14, 2012 at 05:57:06PM +0900, Minchan Kim wrote:
> > > Now cma reclaims too many pages by __reclaim_pages which says
> > > following as
> > >
> > >         * Reclaim enough pages to make sure that contiguous allocation
> > >         * will not starve the system.
> > >
> > > Starve? What does it starve the system? The function which allocate
> > > free page for migration target would wake up kswapd and do direct reclaim
> > > if needed during migration so system doesn't starve.
> > >
> >
> > I thought this patch was overkill at the time it was introduced but
> > didn't have a concrete reason to reject it when I commented on it
> > https://lkml.org/lkml/2012/1/30/136 . Marek did want this and followed
> > up with "contiguous allocations should have higher priority than others"
> > which I took to mean that he was also ok with excessive reclaim.
> 
> I think OOM kill to background applications is more appropriate than
> big latency of foreground(ex, Camera app) application in your mobile phone.
> In other words, excessive reclaim is *really* bad which elapsed 8sec
> in my test as worst case. :(
> 
> >
> > > Let remove __reclaim_pages and related function and fields.
> > >
> >
> > That should be one patch and I don't object to it being removed as such
> > but it's Marek's call.
> 
> Marek. Any thought?

Well, I've introduced this function as a result of solving Mel's request:

http://www.spinics.net/lists/linux-mm/msg28485.html

Before that I solved it almost the same as in your current patch. I'm aware of 
the fact that __reclaim_pages() approach might be a little overkill, but I didn't
find anything better yet.

> > > I modified split_free_page slightly because I removed __reclaim_pages,
> > > isolate_freepages_range can fail by split_free_page's watermark check.
> > > It's very critical in CMA because it ends up failing alloc_contig_range.
> > >
> >
> > This is a big change and should have been in a patch on its
> > own. split_free_page checks watermarks because if the watermarks are
> > not obeyed a zone can become fully allocated. This can cause a system to
> > livelock under certain circumstances if a page cannot be allocated and a
> > free page is required before other pages can be freed.
> >
> > > I think we don't need the check in case of CMA because CMA allocates
> > > free pages by alloc_pages, not isolate_freepages_block in migrate_pages
> > > so watermark is already checked in alloc_pages.
> >
> > It uses alloc_pages when migrating pages out of the CMA area but note
> > that it uses isolate_freepages_block when allocating the CMA buffer when
> > alloc_contig_range calls isolate_freepages_range
> >
> > isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
> > {
> > 	for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) {
> >                isolated = isolate_freepages_block(pfn, block_end_pfn,
> >                                                    &freelist, true);
> > 	}
> > 	map_pages(&freelist);
> > }
> >
> > so the actual CMA allocation itself is not using alloc_pages. By removing
> > the watermark check you allow the CMA to breach watermarks and puts the
> > system at risk of livelock.
> 
> Fair enough. I will look into that.

We will also looks into this issue. 

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-08-17 14:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-14  8:57 [RFC 0/2] Reduce alloc_contig_range latency Minchan Kim
2012-08-14  8:57 ` [RFC 1/2] cma: remove __reclaim_pages Minchan Kim
2012-08-15 18:49   ` Rik van Riel
2012-08-16 13:58   ` Mel Gorman
2012-08-17  1:05     ` Minchan Kim
2012-08-17 14:48       ` Marek Szyprowski
2012-08-14  8:57 ` [RFC 2/2] cma: support MIGRATE_DISCARD Minchan Kim
2012-08-14 14:19   ` Michal Nazarewicz
2012-08-15 23:20     ` Minchan Kim
2012-08-16 13:17       ` Michal Nazarewicz
2012-08-15 18:58   ` Rik van Riel
2012-08-15 23:33     ` Minchan Kim
2012-08-16  0:15       ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).