[RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction
@ 2013-08-06  6:42 ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-06  6:42 UTC (permalink / raw)
  To: Seth Jennings, linux-mm, linux-kernel, Andrew Morton
  Cc: Mel Gorman, Bartlomiej Zolnierkiewicz, Marek Szyprowski,
	Kyungmin Park, Krzysztof Kozlowski

Hi,

Currently zbud pages are not movable and they cannot be allocated from CMA
region. These patches try to address the problem by:
1. Adding a new form of reclaim of zbud pages.
2. Reclaiming zbud pages during migration and compaction.
3. Allocating zbud pages with __GFP_RECLAIMABLE flag.

This reclaim process is different than zbud_reclaim_page(). It acts more
like swapoff() by trying to unuse pages stored in zbud page and bring
them back to memory. The standard zbud_reclaim_page() on the other hand
tries to write them back.

One of patches introduces a new flag: PageZbud. This flag is used in
isolate_migratepages_range() to grab zbud pages and pass them later
for reclaim. Probably this could be replaced with something
smarter than a flag used only in one case.
Any ideas for a better solution are welcome.

This patch set is based on Linux 3.11-rc4.

TODOs:
1. Replace PageZbud flag with other solution.

Best regards,
Krzysztof Kozlowski

Krzysztof Kozlowski (4):
  zbud: use page ref counter for zbud pages
  mm: split code for unusing swap entries from try_to_unuse
  mm: add zbud flag to page flags
  mm: reclaim zbud pages on migration and compaction

 include/linux/page-flags.h |   12 ++
 include/linux/swapfile.h   |    2 +
 include/linux/zbud.h       |   11 +-
 mm/compaction.c            |   20 ++-
 mm/internal.h              |    1 +
 mm/page_alloc.c            |    9 ++
 mm/swapfile.c              |  354 +++++++++++++++++++++++---------------------
 mm/zbud.c                  |  301 +++++++++++++++++++++++++------------
 mm/zswap.c                 |   57 ++++++-
 9 files changed, 499 insertions(+), 268 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction
@ 2013-08-06  6:42 ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-06  6:42 UTC (permalink / raw)
  To: Seth Jennings, linux-mm, linux-kernel, Andrew Morton
  Cc: Mel Gorman, Bartlomiej Zolnierkiewicz, Marek Szyprowski,
	Kyungmin Park, Krzysztof Kozlowski

Hi,

Currently zbud pages are not movable and they cannot be allocated from CMA
region. These patches try to address the problem by:
1. Adding a new form of reclaim of zbud pages.
2. Reclaiming zbud pages during migration and compaction.
3. Allocating zbud pages with __GFP_RECLAIMABLE flag.

This reclaim process is different than zbud_reclaim_page(). It acts more
like swapoff() by trying to unuse pages stored in zbud page and bring
them back to memory. The standard zbud_reclaim_page() on the other hand
tries to write them back.

One of patches introduces a new flag: PageZbud. This flag is used in
isolate_migratepages_range() to grab zbud pages and pass them later
for reclaim. Probably this could be replaced with something
smarter than a flag used only in one case.
Any ideas for a better solution are welcome.

This patch set is based on Linux 3.11-rc4.

TODOs:
1. Replace PageZbud flag with other solution.

Best regards,
Krzysztof Kozlowski

Krzysztof Kozlowski (4):
  zbud: use page ref counter for zbud pages
  mm: split code for unusing swap entries from try_to_unuse
  mm: add zbud flag to page flags
  mm: reclaim zbud pages on migration and compaction

 include/linux/page-flags.h |   12 ++
 include/linux/swapfile.h   |    2 +
 include/linux/zbud.h       |   11 +-
 mm/compaction.c            |   20 ++-
 mm/internal.h              |    1 +
 mm/page_alloc.c            |    9 ++
 mm/swapfile.c              |  354 +++++++++++++++++++++++---------------------
 mm/zbud.c                  |  301 +++++++++++++++++++++++++------------
 mm/zswap.c                 |   57 ++++++-
 9 files changed, 499 insertions(+), 268 deletions(-)

-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
  2013-08-06  6:42 ` Krzysztof Kozlowski
@ 2013-08-06  6:42   ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-06  6:42 UTC (permalink / raw)
  To: Seth Jennings, linux-mm, linux-kernel, Andrew Morton
  Cc: Mel Gorman, Bartlomiej Zolnierkiewicz, Marek Szyprowski,
	Kyungmin Park, Krzysztof Kozlowski, Tomasz Stanislawski

Use page reference counter for zbud pages. The ref counter replaces
zbud_header.under_reclaim flag and ensures that zbud page won't be freed
when zbud_free() is called during reclaim. It allows implementation of
additional reclaim paths.

The page count is incremented when:
 - a handle is created and passed to zswap (in zbud_alloc()),
 - user-supplied eviction callback is called (in zbud_reclaim_page()).

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
---
 mm/zbud.c |  150 +++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 86 insertions(+), 64 deletions(-)

diff --git a/mm/zbud.c b/mm/zbud.c
index ad1e781..a8e986f 100644
--- a/mm/zbud.c
+++ b/mm/zbud.c
@@ -109,7 +109,6 @@ struct zbud_header {
 	struct list_head lru;
 	unsigned int first_chunks;
 	unsigned int last_chunks;
-	bool under_reclaim;
 };
 
 /*****************
@@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
 	zhdr->last_chunks = 0;
 	INIT_LIST_HEAD(&zhdr->buddy);
 	INIT_LIST_HEAD(&zhdr->lru);
-	zhdr->under_reclaim = 0;
 	return zhdr;
 }
 
-/* Resets the struct page fields and frees the page */
-static void free_zbud_page(struct zbud_header *zhdr)
-{
-	__free_page(virt_to_page(zhdr));
-}
-
 /*
  * Encodes the handle of a particular buddy within a zbud page
  * Pool lock should be held as this function accesses first|last_chunks
@@ -188,6 +180,65 @@ static int num_free_chunks(struct zbud_header *zhdr)
 	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
 }
 
+/*
+ * Called after zbud_free() or zbud_alloc().
+ * Checks whether given zbud page has to be:
+ *  - removed from buddied/unbuddied/LRU lists completetely (zbud_free).
+ *  - moved from buddied to unbuddied list
+ *    and to beginning of LRU (zbud_alloc, zbud_free),
+ *  - added to buddied list and LRU (zbud_alloc),
+ *
+ * The page must be already removed from buddied/unbuddied lists.
+ * Must be called under pool->lock.
+ */
+static void rebalance_lists(struct zbud_pool *pool, struct zbud_header *zhdr)
+{
+	if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
+		/* zbud_free() */
+		list_del(&zhdr->lru);
+		return;
+	} else if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
+		/* zbud_free() or zbud_alloc() */
+		int freechunks = num_free_chunks(zhdr);
+		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
+	} else {
+		/* zbud_alloc() */
+		list_add(&zhdr->buddy, &pool->buddied);
+	}
+	/* Add/move zbud page to beginning of LRU */
+	if (!list_empty(&zhdr->lru))
+		list_del(&zhdr->lru);
+	list_add(&zhdr->lru, &pool->lru);
+}
+
+/*
+ * Increases ref count for zbud page.
+ */
+static void get_zbud_page(struct zbud_header *zhdr)
+{
+	get_page(virt_to_page(zhdr));
+}
+
+/*
+ * Decreases ref count for zbud page and frees the page if it reaches 0
+ * (no external references, e.g. handles).
+ *
+ * Must be called under pool->lock.
+ *
+ * Returns 1 if page was freed and 0 otherwise.
+ */
+static int put_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr)
+{
+	struct page *page = virt_to_page(zhdr);
+	if (put_page_testzero(page)) {
+		free_hot_cold_page(page, 0);
+		pool->pages_nr--;
+		return 1;
+	}
+	return 0;
+}
+
+
 /*****************
  * API Functions
 *****************/
@@ -250,7 +301,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
 int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 			unsigned long *handle)
 {
-	int chunks, i, freechunks;
+	int chunks, i;
 	struct zbud_header *zhdr = NULL;
 	enum buddy bud;
 	struct page *page;
@@ -273,6 +324,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 				bud = FIRST;
 			else
 				bud = LAST;
+			get_zbud_page(zhdr);
 			goto found;
 		}
 	}
@@ -284,6 +336,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 		return -ENOMEM;
 	spin_lock(&pool->lock);
 	pool->pages_nr++;
+	/*
+	 * We will be using zhdr instead of page, so
+	 * don't increase the page count.
+	 */
 	zhdr = init_zbud_page(page);
 	bud = FIRST;
 
@@ -293,19 +349,7 @@ found:
 	else
 		zhdr->last_chunks = chunks;
 
-	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
-		/* Add to unbuddied list */
-		freechunks = num_free_chunks(zhdr);
-		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
-	} else {
-		/* Add to buddied list */
-		list_add(&zhdr->buddy, &pool->buddied);
-	}
-
-	/* Add/move zbud page to beginning of LRU */
-	if (!list_empty(&zhdr->lru))
-		list_del(&zhdr->lru);
-	list_add(&zhdr->lru, &pool->lru);
+	rebalance_lists(pool, zhdr);
 
 	*handle = encode_handle(zhdr, bud);
 	spin_unlock(&pool->lock);
@@ -326,10 +370,10 @@ found:
 void zbud_free(struct zbud_pool *pool, unsigned long handle)
 {
 	struct zbud_header *zhdr;
-	int freechunks;
 
 	spin_lock(&pool->lock);
 	zhdr = handle_to_zbud_header(handle);
+	BUG_ON(zhdr->last_chunks == 0 && zhdr->first_chunks == 0);
 
 	/* If first buddy, handle will be page aligned */
 	if ((handle - ZHDR_SIZE_ALIGNED) & ~PAGE_MASK)
@@ -337,26 +381,9 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
 	else
 		zhdr->first_chunks = 0;
 
-	if (zhdr->under_reclaim) {
-		/* zbud page is under reclaim, reclaim will free */
-		spin_unlock(&pool->lock);
-		return;
-	}
-
-	/* Remove from existing buddy list */
 	list_del(&zhdr->buddy);
-
-	if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
-		/* zbud page is empty, free */
-		list_del(&zhdr->lru);
-		free_zbud_page(zhdr);
-		pool->pages_nr--;
-	} else {
-		/* Add to unbuddied list */
-		freechunks = num_free_chunks(zhdr);
-		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
-	}
-
+	rebalance_lists(pool, zhdr);
+	put_zbud_page(pool, zhdr);
 	spin_unlock(&pool->lock);
 }
 
@@ -400,7 +427,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
  */
 int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
 {
-	int i, ret, freechunks;
+	int i, ret;
 	struct zbud_header *zhdr;
 	unsigned long first_handle = 0, last_handle = 0;
 
@@ -411,11 +438,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
 		return -EINVAL;
 	}
 	for (i = 0; i < retries; i++) {
+		if (list_empty(&pool->lru)) {
+			/*
+			 * LRU was emptied during evict calls in previous
+			 * iteration but put_zbud_page() returned 0 meaning
+			 * that someone still holds the page. This may
+			 * happen when some other mm mechanism increased
+			 * the page count.
+			 * In such case we succedded with reclaim.
+			 */
+			return 0;
+		}
 		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
+		BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0);
+		/* Move this last element to beginning of LRU */
 		list_del(&zhdr->lru);
-		list_del(&zhdr->buddy);
+		list_add(&zhdr->lru, &pool->lru);
 		/* Protect zbud page against free */
-		zhdr->under_reclaim = true;
+		get_zbud_page(zhdr);
 		/*
 		 * We need encode the handles before unlocking, since we can
 		 * race with free that will set (first|last)_chunks to 0
@@ -441,28 +481,10 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
 		}
 next:
 		spin_lock(&pool->lock);
-		zhdr->under_reclaim = false;
-		if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
-			/*
-			 * Both buddies are now free, free the zbud page and
-			 * return success.
-			 */
-			free_zbud_page(zhdr);
-			pool->pages_nr--;
+		if (put_zbud_page(pool, zhdr)) {
 			spin_unlock(&pool->lock);
 			return 0;
-		} else if (zhdr->first_chunks == 0 ||
-				zhdr->last_chunks == 0) {
-			/* add to unbuddied list */
-			freechunks = num_free_chunks(zhdr);
-			list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
-		} else {
-			/* add to buddied list */
-			list_add(&zhdr->buddy, &pool->buddied);
 		}
-
-		/* add to beginning of LRU */
-		list_add(&zhdr->lru, &pool->lru);
 	}
 	spin_unlock(&pool->lock);
 	return -EAGAIN;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
@ 2013-08-06  6:42   ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-06  6:42 UTC (permalink / raw)
  To: Seth Jennings, linux-mm, linux-kernel, Andrew Morton
  Cc: Mel Gorman, Bartlomiej Zolnierkiewicz, Marek Szyprowski,
	Kyungmin Park, Krzysztof Kozlowski, Tomasz Stanislawski

Use page reference counter for zbud pages. The ref counter replaces
zbud_header.under_reclaim flag and ensures that zbud page won't be freed
when zbud_free() is called during reclaim. It allows implementation of
additional reclaim paths.

The page count is incremented when:
 - a handle is created and passed to zswap (in zbud_alloc()),
 - user-supplied eviction callback is called (in zbud_reclaim_page()).

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
---
 mm/zbud.c |  150 +++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 86 insertions(+), 64 deletions(-)

diff --git a/mm/zbud.c b/mm/zbud.c
index ad1e781..a8e986f 100644
--- a/mm/zbud.c
+++ b/mm/zbud.c
@@ -109,7 +109,6 @@ struct zbud_header {
 	struct list_head lru;
 	unsigned int first_chunks;
 	unsigned int last_chunks;
-	bool under_reclaim;
 };
 
 /*****************
@@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
 	zhdr->last_chunks = 0;
 	INIT_LIST_HEAD(&zhdr->buddy);
 	INIT_LIST_HEAD(&zhdr->lru);
-	zhdr->under_reclaim = 0;
 	return zhdr;
 }
 
-/* Resets the struct page fields and frees the page */
-static void free_zbud_page(struct zbud_header *zhdr)
-{
-	__free_page(virt_to_page(zhdr));
-}
-
 /*
  * Encodes the handle of a particular buddy within a zbud page
  * Pool lock should be held as this function accesses first|last_chunks
@@ -188,6 +180,65 @@ static int num_free_chunks(struct zbud_header *zhdr)
 	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
 }
 
+/*
+ * Called after zbud_free() or zbud_alloc().
+ * Checks whether given zbud page has to be:
+ *  - removed from buddied/unbuddied/LRU lists completetely (zbud_free).
+ *  - moved from buddied to unbuddied list
+ *    and to beginning of LRU (zbud_alloc, zbud_free),
+ *  - added to buddied list and LRU (zbud_alloc),
+ *
+ * The page must be already removed from buddied/unbuddied lists.
+ * Must be called under pool->lock.
+ */
+static void rebalance_lists(struct zbud_pool *pool, struct zbud_header *zhdr)
+{
+	if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
+		/* zbud_free() */
+		list_del(&zhdr->lru);
+		return;
+	} else if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
+		/* zbud_free() or zbud_alloc() */
+		int freechunks = num_free_chunks(zhdr);
+		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
+	} else {
+		/* zbud_alloc() */
+		list_add(&zhdr->buddy, &pool->buddied);
+	}
+	/* Add/move zbud page to beginning of LRU */
+	if (!list_empty(&zhdr->lru))
+		list_del(&zhdr->lru);
+	list_add(&zhdr->lru, &pool->lru);
+}
+
+/*
+ * Increases ref count for zbud page.
+ */
+static void get_zbud_page(struct zbud_header *zhdr)
+{
+	get_page(virt_to_page(zhdr));
+}
+
+/*
+ * Decreases ref count for zbud page and frees the page if it reaches 0
+ * (no external references, e.g. handles).
+ *
+ * Must be called under pool->lock.
+ *
+ * Returns 1 if page was freed and 0 otherwise.
+ */
+static int put_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr)
+{
+	struct page *page = virt_to_page(zhdr);
+	if (put_page_testzero(page)) {
+		free_hot_cold_page(page, 0);
+		pool->pages_nr--;
+		return 1;
+	}
+	return 0;
+}
+
+
 /*****************
  * API Functions
 *****************/
@@ -250,7 +301,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
 int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 			unsigned long *handle)
 {
-	int chunks, i, freechunks;
+	int chunks, i;
 	struct zbud_header *zhdr = NULL;
 	enum buddy bud;
 	struct page *page;
@@ -273,6 +324,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 				bud = FIRST;
 			else
 				bud = LAST;
+			get_zbud_page(zhdr);
 			goto found;
 		}
 	}
@@ -284,6 +336,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 		return -ENOMEM;
 	spin_lock(&pool->lock);
 	pool->pages_nr++;
+	/*
+	 * We will be using zhdr instead of page, so
+	 * don't increase the page count.
+	 */
 	zhdr = init_zbud_page(page);
 	bud = FIRST;
 
@@ -293,19 +349,7 @@ found:
 	else
 		zhdr->last_chunks = chunks;
 
-	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
-		/* Add to unbuddied list */
-		freechunks = num_free_chunks(zhdr);
-		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
-	} else {
-		/* Add to buddied list */
-		list_add(&zhdr->buddy, &pool->buddied);
-	}
-
-	/* Add/move zbud page to beginning of LRU */
-	if (!list_empty(&zhdr->lru))
-		list_del(&zhdr->lru);
-	list_add(&zhdr->lru, &pool->lru);
+	rebalance_lists(pool, zhdr);
 
 	*handle = encode_handle(zhdr, bud);
 	spin_unlock(&pool->lock);
@@ -326,10 +370,10 @@ found:
 void zbud_free(struct zbud_pool *pool, unsigned long handle)
 {
 	struct zbud_header *zhdr;
-	int freechunks;
 
 	spin_lock(&pool->lock);
 	zhdr = handle_to_zbud_header(handle);
+	BUG_ON(zhdr->last_chunks == 0 && zhdr->first_chunks == 0);
 
 	/* If first buddy, handle will be page aligned */
 	if ((handle - ZHDR_SIZE_ALIGNED) & ~PAGE_MASK)
@@ -337,26 +381,9 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
 	else
 		zhdr->first_chunks = 0;
 
-	if (zhdr->under_reclaim) {
-		/* zbud page is under reclaim, reclaim will free */
-		spin_unlock(&pool->lock);
-		return;
-	}
-
-	/* Remove from existing buddy list */
 	list_del(&zhdr->buddy);
-
-	if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
-		/* zbud page is empty, free */
-		list_del(&zhdr->lru);
-		free_zbud_page(zhdr);
-		pool->pages_nr--;
-	} else {
-		/* Add to unbuddied list */
-		freechunks = num_free_chunks(zhdr);
-		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
-	}
-
+	rebalance_lists(pool, zhdr);
+	put_zbud_page(pool, zhdr);
 	spin_unlock(&pool->lock);
 }
 
@@ -400,7 +427,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
  */
 int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
 {
-	int i, ret, freechunks;
+	int i, ret;
 	struct zbud_header *zhdr;
 	unsigned long first_handle = 0, last_handle = 0;
 
@@ -411,11 +438,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
 		return -EINVAL;
 	}
 	for (i = 0; i < retries; i++) {
+		if (list_empty(&pool->lru)) {
+			/*
+			 * LRU was emptied during evict calls in previous
+			 * iteration but put_zbud_page() returned 0 meaning
+			 * that someone still holds the page. This may
+			 * happen when some other mm mechanism increased
+			 * the page count.
+			 * In such case we succedded with reclaim.
+			 */
+			return 0;
+		}
 		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
+		BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0);
+		/* Move this last element to beginning of LRU */
 		list_del(&zhdr->lru);
-		list_del(&zhdr->buddy);
+		list_add(&zhdr->lru, &pool->lru);
 		/* Protect zbud page against free */
-		zhdr->under_reclaim = true;
+		get_zbud_page(zhdr);
 		/*
 		 * We need encode the handles before unlocking, since we can
 		 * race with free that will set (first|last)_chunks to 0
@@ -441,28 +481,10 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
 		}
 next:
 		spin_lock(&pool->lock);
-		zhdr->under_reclaim = false;
-		if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
-			/*
-			 * Both buddies are now free, free the zbud page and
-			 * return success.
-			 */
-			free_zbud_page(zhdr);
-			pool->pages_nr--;
+		if (put_zbud_page(pool, zhdr)) {
 			spin_unlock(&pool->lock);
 			return 0;
-		} else if (zhdr->first_chunks == 0 ||
-				zhdr->last_chunks == 0) {
-			/* add to unbuddied list */
-			freechunks = num_free_chunks(zhdr);
-			list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
-		} else {
-			/* add to buddied list */
-			list_add(&zhdr->buddy, &pool->buddied);
 		}
-
-		/* add to beginning of LRU */
-		list_add(&zhdr->lru, &pool->lru);
 	}
 	spin_unlock(&pool->lock);
 	return -EAGAIN;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 2/4] mm: split code for unusing swap entries from try_to_unuse
  2013-08-06  6:42 ` Krzysztof Kozlowski
@ 2013-08-06  6:42   ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-06  6:42 UTC (permalink / raw)
  To: Seth Jennings, linux-mm, linux-kernel, Andrew Morton
  Cc: Mel Gorman, Bartlomiej Zolnierkiewicz, Marek Szyprowski,
	Kyungmin Park, Krzysztof Kozlowski

Move out the code for unusing swap entries from loop in try_to_unuse()
to separate function: try_to_unuse_swp_entry(). Export this new function
in swapfile.h just like try_to_unuse() is exported.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
---
 include/linux/swapfile.h |    2 +
 mm/swapfile.c            |  354 ++++++++++++++++++++++++----------------------
 2 files changed, 187 insertions(+), 169 deletions(-)

diff --git a/include/linux/swapfile.h b/include/linux/swapfile.h
index e282624..68c24a7 100644
--- a/include/linux/swapfile.h
+++ b/include/linux/swapfile.h
@@ -9,5 +9,7 @@ extern spinlock_t swap_lock;
 extern struct swap_list_t swap_list;
 extern struct swap_info_struct *swap_info[];
 extern int try_to_unuse(unsigned int, bool, unsigned long);
+extern int try_to_unuse_swp_entry(struct mm_struct **start_mm,
+		struct swap_info_struct *si, swp_entry_t entry);
 
 #endif /* _LINUX_SWAPFILE_H */
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 36af6ee..331d0b8 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1100,6 +1100,189 @@ static unsigned int find_next_to_unuse(struct swap_info_struct *si,
 }
 
 /*
+ * Returns:
+ *  - negative on error,
+ *  - 0 on success (entry unused)
+ */
+int try_to_unuse_swp_entry(struct mm_struct **start_mm,
+		struct swap_info_struct *si, swp_entry_t entry)
+{
+	pgoff_t offset = swp_offset(entry);
+	unsigned char *swap_map;
+	unsigned char swcount;
+	struct page *page;
+	int retval = 0;
+
+	if (signal_pending(current)) {
+		retval = -EINTR;
+		goto out;
+	}
+
+	/*
+	 * Get a page for the entry, using the existing swap
+	 * cache page if there is one.  Otherwise, get a clean
+	 * page and read the swap into it.
+	 */
+	swap_map = &si->swap_map[offset];
+	page = read_swap_cache_async(entry,
+				GFP_HIGHUSER_MOVABLE, NULL, 0);
+	if (!page) {
+		/*
+		 * Either swap_duplicate() failed because entry
+		 * has been freed independently, and will not be
+		 * reused since sys_swapoff() already disabled
+		 * allocation from here, or alloc_page() failed.
+		 */
+		if (!*swap_map)
+			retval = 0;
+		else
+			retval = -ENOMEM;
+		goto out;
+	}
+
+	/*
+	 * Don't hold on to start_mm if it looks like exiting.
+	 */
+	if (atomic_read(&(*start_mm)->mm_users) == 1) {
+		mmput(*start_mm);
+		*start_mm = &init_mm;
+		atomic_inc(&init_mm.mm_users);
+	}
+
+	/*
+	 * Wait for and lock page.  When do_swap_page races with
+	 * try_to_unuse, do_swap_page can handle the fault much
+	 * faster than try_to_unuse can locate the entry.  This
+	 * apparently redundant "wait_on_page_locked" lets try_to_unuse
+	 * defer to do_swap_page in such a case - in some tests,
+	 * do_swap_page and try_to_unuse repeatedly compete.
+	 */
+	wait_on_page_locked(page);
+	wait_on_page_writeback(page);
+	lock_page(page);
+	wait_on_page_writeback(page);
+
+	/*
+	 * Remove all references to entry.
+	 */
+	swcount = *swap_map;
+	if (swap_count(swcount) == SWAP_MAP_SHMEM) {
+		retval = shmem_unuse(entry, page);
+		VM_BUG_ON(retval > 0);
+		/* page has already been unlocked and released */
+		goto out;
+	}
+	if (swap_count(swcount) && *start_mm != &init_mm)
+		retval = unuse_mm(*start_mm, entry, page);
+
+	if (swap_count(*swap_map)) {
+		int set_start_mm = (*swap_map >= swcount);
+		struct list_head *p = &(*start_mm)->mmlist;
+		struct mm_struct *new_start_mm = *start_mm;
+		struct mm_struct *prev_mm = *start_mm;
+		struct mm_struct *mm;
+
+		atomic_inc(&new_start_mm->mm_users);
+		atomic_inc(&prev_mm->mm_users);
+		spin_lock(&mmlist_lock);
+		while (swap_count(*swap_map) && !retval &&
+				(p = p->next) != &(*start_mm)->mmlist) {
+			mm = list_entry(p, struct mm_struct, mmlist);
+			if (!atomic_inc_not_zero(&mm->mm_users))
+				continue;
+			spin_unlock(&mmlist_lock);
+			mmput(prev_mm);
+			prev_mm = mm;
+
+			cond_resched();
+
+			swcount = *swap_map;
+			if (!swap_count(swcount)) /* any usage ? */
+				;
+			else if (mm == &init_mm)
+				set_start_mm = 1;
+			else
+				retval = unuse_mm(mm, entry, page);
+
+			if (set_start_mm && *swap_map < swcount) {
+				mmput(new_start_mm);
+				atomic_inc(&mm->mm_users);
+				new_start_mm = mm;
+				set_start_mm = 0;
+			}
+			spin_lock(&mmlist_lock);
+		}
+		spin_unlock(&mmlist_lock);
+		mmput(prev_mm);
+		mmput(*start_mm);
+		*start_mm = new_start_mm;
+	}
+	if (retval) {
+		unlock_page(page);
+		page_cache_release(page);
+		goto out;
+	}
+
+	/*
+	 * If a reference remains (rare), we would like to leave
+	 * the page in the swap cache; but try_to_unmap could
+	 * then re-duplicate the entry once we drop page lock,
+	 * so we might loop indefinitely; also, that page could
+	 * not be swapped out to other storage meanwhile.  So:
+	 * delete from cache even if there's another reference,
+	 * after ensuring that the data has been saved to disk -
+	 * since if the reference remains (rarer), it will be
+	 * read from disk into another page.  Splitting into two
+	 * pages would be incorrect if swap supported "shared
+	 * private" pages, but they are handled by tmpfs files.
+	 *
+	 * Given how unuse_vma() targets one particular offset
+	 * in an anon_vma, once the anon_vma has been determined,
+	 * this splitting happens to be just what is needed to
+	 * handle where KSM pages have been swapped out: re-reading
+	 * is unnecessarily slow, but we can fix that later on.
+	 */
+	if (swap_count(*swap_map) &&
+	     PageDirty(page) && PageSwapCache(page)) {
+		struct writeback_control wbc = {
+			.sync_mode = WB_SYNC_NONE,
+		};
+
+		swap_writepage(page, &wbc);
+		lock_page(page);
+		wait_on_page_writeback(page);
+	}
+
+	/*
+	 * It is conceivable that a racing task removed this page from
+	 * swap cache just before we acquired the page lock at the top,
+	 * or while we dropped it in unuse_mm().  The page might even
+	 * be back in swap cache on another swap area: that we must not
+	 * delete, since it may not have been written out to swap yet.
+	 */
+	if (PageSwapCache(page) &&
+	    likely(page_private(page) == entry.val))
+		delete_from_swap_cache(page);
+
+	/*
+	 * So we could skip searching mms once swap count went
+	 * to 1, we did not mark any present ptes as dirty: must
+	 * mark page dirty so shrink_page_list will preserve it.
+	 */
+	SetPageDirty(page);
+	unlock_page(page);
+	page_cache_release(page);
+
+	/*
+	 * Make sure that we aren't completely killing
+	 * interactive performance.
+	 */
+	cond_resched();
+out:
+	return retval;
+}
+
+/*
  * We completely avoid races by reading each swap page in advance,
  * and then search for the process using it.  All the necessary
  * page table adjustments can then be made atomically.
@@ -1112,10 +1295,6 @@ int try_to_unuse(unsigned int type, bool frontswap,
 {
 	struct swap_info_struct *si = swap_info[type];
 	struct mm_struct *start_mm;
-	unsigned char *swap_map;
-	unsigned char swcount;
-	struct page *page;
-	swp_entry_t entry;
 	unsigned int i = 0;
 	int retval = 0;
 
@@ -1142,172 +1321,9 @@ int try_to_unuse(unsigned int type, bool frontswap,
 	 * there are races when an instance of an entry might be missed.
 	 */
 	while ((i = find_next_to_unuse(si, i, frontswap)) != 0) {
-		if (signal_pending(current)) {
-			retval = -EINTR;
-			break;
-		}
-
-		/*
-		 * Get a page for the entry, using the existing swap
-		 * cache page if there is one.  Otherwise, get a clean
-		 * page and read the swap into it.
-		 */
-		swap_map = &si->swap_map[i];
-		entry = swp_entry(type, i);
-		page = read_swap_cache_async(entry,
-					GFP_HIGHUSER_MOVABLE, NULL, 0);
-		if (!page) {
-			/*
-			 * Either swap_duplicate() failed because entry
-			 * has been freed independently, and will not be
-			 * reused since sys_swapoff() already disabled
-			 * allocation from here, or alloc_page() failed.
-			 */
-			if (!*swap_map)
-				continue;
-			retval = -ENOMEM;
-			break;
-		}
-
-		/*
-		 * Don't hold on to start_mm if it looks like exiting.
-		 */
-		if (atomic_read(&start_mm->mm_users) == 1) {
-			mmput(start_mm);
-			start_mm = &init_mm;
-			atomic_inc(&init_mm.mm_users);
-		}
-
-		/*
-		 * Wait for and lock page.  When do_swap_page races with
-		 * try_to_unuse, do_swap_page can handle the fault much
-		 * faster than try_to_unuse can locate the entry.  This
-		 * apparently redundant "wait_on_page_locked" lets try_to_unuse
-		 * defer to do_swap_page in such a case - in some tests,
-		 * do_swap_page and try_to_unuse repeatedly compete.
-		 */
-		wait_on_page_locked(page);
-		wait_on_page_writeback(page);
-		lock_page(page);
-		wait_on_page_writeback(page);
-
-		/*
-		 * Remove all references to entry.
-		 */
-		swcount = *swap_map;
-		if (swap_count(swcount) == SWAP_MAP_SHMEM) {
-			retval = shmem_unuse(entry, page);
-			/* page has already been unlocked and released */
-			if (retval < 0)
-				break;
-			continue;
-		}
-		if (swap_count(swcount) && start_mm != &init_mm)
-			retval = unuse_mm(start_mm, entry, page);
-
-		if (swap_count(*swap_map)) {
-			int set_start_mm = (*swap_map >= swcount);
-			struct list_head *p = &start_mm->mmlist;
-			struct mm_struct *new_start_mm = start_mm;
-			struct mm_struct *prev_mm = start_mm;
-			struct mm_struct *mm;
-
-			atomic_inc(&new_start_mm->mm_users);
-			atomic_inc(&prev_mm->mm_users);
-			spin_lock(&mmlist_lock);
-			while (swap_count(*swap_map) && !retval &&
-					(p = p->next) != &start_mm->mmlist) {
-				mm = list_entry(p, struct mm_struct, mmlist);
-				if (!atomic_inc_not_zero(&mm->mm_users))
-					continue;
-				spin_unlock(&mmlist_lock);
-				mmput(prev_mm);
-				prev_mm = mm;
-
-				cond_resched();
-
-				swcount = *swap_map;
-				if (!swap_count(swcount)) /* any usage ? */
-					;
-				else if (mm == &init_mm)
-					set_start_mm = 1;
-				else
-					retval = unuse_mm(mm, entry, page);
-
-				if (set_start_mm && *swap_map < swcount) {
-					mmput(new_start_mm);
-					atomic_inc(&mm->mm_users);
-					new_start_mm = mm;
-					set_start_mm = 0;
-				}
-				spin_lock(&mmlist_lock);
-			}
-			spin_unlock(&mmlist_lock);
-			mmput(prev_mm);
-			mmput(start_mm);
-			start_mm = new_start_mm;
-		}
-		if (retval) {
-			unlock_page(page);
-			page_cache_release(page);
+		if (try_to_unuse_swp_entry(&start_mm, si,
+					swp_entry(type, i)) != 0)
 			break;
-		}
-
-		/*
-		 * If a reference remains (rare), we would like to leave
-		 * the page in the swap cache; but try_to_unmap could
-		 * then re-duplicate the entry once we drop page lock,
-		 * so we might loop indefinitely; also, that page could
-		 * not be swapped out to other storage meanwhile.  So:
-		 * delete from cache even if there's another reference,
-		 * after ensuring that the data has been saved to disk -
-		 * since if the reference remains (rarer), it will be
-		 * read from disk into another page.  Splitting into two
-		 * pages would be incorrect if swap supported "shared
-		 * private" pages, but they are handled by tmpfs files.
-		 *
-		 * Given how unuse_vma() targets one particular offset
-		 * in an anon_vma, once the anon_vma has been determined,
-		 * this splitting happens to be just what is needed to
-		 * handle where KSM pages have been swapped out: re-reading
-		 * is unnecessarily slow, but we can fix that later on.
-		 */
-		if (swap_count(*swap_map) &&
-		     PageDirty(page) && PageSwapCache(page)) {
-			struct writeback_control wbc = {
-				.sync_mode = WB_SYNC_NONE,
-			};
-
-			swap_writepage(page, &wbc);
-			lock_page(page);
-			wait_on_page_writeback(page);
-		}
-
-		/*
-		 * It is conceivable that a racing task removed this page from
-		 * swap cache just before we acquired the page lock at the top,
-		 * or while we dropped it in unuse_mm().  The page might even
-		 * be back in swap cache on another swap area: that we must not
-		 * delete, since it may not have been written out to swap yet.
-		 */
-		if (PageSwapCache(page) &&
-		    likely(page_private(page) == entry.val))
-			delete_from_swap_cache(page);
-
-		/*
-		 * So we could skip searching mms once swap count went
-		 * to 1, we did not mark any present ptes as dirty: must
-		 * mark page dirty so shrink_page_list will preserve it.
-		 */
-		SetPageDirty(page);
-		unlock_page(page);
-		page_cache_release(page);
-
-		/*
-		 * Make sure that we aren't completely killing
-		 * interactive performance.
-		 */
-		cond_resched();
 		if (frontswap && pages_to_unuse > 0) {
 			if (!--pages_to_unuse)
 				break;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 2/4] mm: split code for unusing swap entries from try_to_unuse
@ 2013-08-06  6:42   ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-06  6:42 UTC (permalink / raw)
  To: Seth Jennings, linux-mm, linux-kernel, Andrew Morton
  Cc: Mel Gorman, Bartlomiej Zolnierkiewicz, Marek Szyprowski,
	Kyungmin Park, Krzysztof Kozlowski

Move out the code for unusing swap entries from loop in try_to_unuse()
to separate function: try_to_unuse_swp_entry(). Export this new function
in swapfile.h just like try_to_unuse() is exported.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
---
 include/linux/swapfile.h |    2 +
 mm/swapfile.c            |  354 ++++++++++++++++++++++++----------------------
 2 files changed, 187 insertions(+), 169 deletions(-)

diff --git a/include/linux/swapfile.h b/include/linux/swapfile.h
index e282624..68c24a7 100644
--- a/include/linux/swapfile.h
+++ b/include/linux/swapfile.h
@@ -9,5 +9,7 @@ extern spinlock_t swap_lock;
 extern struct swap_list_t swap_list;
 extern struct swap_info_struct *swap_info[];
 extern int try_to_unuse(unsigned int, bool, unsigned long);
+extern int try_to_unuse_swp_entry(struct mm_struct **start_mm,
+		struct swap_info_struct *si, swp_entry_t entry);
 
 #endif /* _LINUX_SWAPFILE_H */
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 36af6ee..331d0b8 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1100,6 +1100,189 @@ static unsigned int find_next_to_unuse(struct swap_info_struct *si,
 }
 
 /*
+ * Returns:
+ *  - negative on error,
+ *  - 0 on success (entry unused)
+ */
+int try_to_unuse_swp_entry(struct mm_struct **start_mm,
+		struct swap_info_struct *si, swp_entry_t entry)
+{
+	pgoff_t offset = swp_offset(entry);
+	unsigned char *swap_map;
+	unsigned char swcount;
+	struct page *page;
+	int retval = 0;
+
+	if (signal_pending(current)) {
+		retval = -EINTR;
+		goto out;
+	}
+
+	/*
+	 * Get a page for the entry, using the existing swap
+	 * cache page if there is one.  Otherwise, get a clean
+	 * page and read the swap into it.
+	 */
+	swap_map = &si->swap_map[offset];
+	page = read_swap_cache_async(entry,
+				GFP_HIGHUSER_MOVABLE, NULL, 0);
+	if (!page) {
+		/*
+		 * Either swap_duplicate() failed because entry
+		 * has been freed independently, and will not be
+		 * reused since sys_swapoff() already disabled
+		 * allocation from here, or alloc_page() failed.
+		 */
+		if (!*swap_map)
+			retval = 0;
+		else
+			retval = -ENOMEM;
+		goto out;
+	}
+
+	/*
+	 * Don't hold on to start_mm if it looks like exiting.
+	 */
+	if (atomic_read(&(*start_mm)->mm_users) == 1) {
+		mmput(*start_mm);
+		*start_mm = &init_mm;
+		atomic_inc(&init_mm.mm_users);
+	}
+
+	/*
+	 * Wait for and lock page.  When do_swap_page races with
+	 * try_to_unuse, do_swap_page can handle the fault much
+	 * faster than try_to_unuse can locate the entry.  This
+	 * apparently redundant "wait_on_page_locked" lets try_to_unuse
+	 * defer to do_swap_page in such a case - in some tests,
+	 * do_swap_page and try_to_unuse repeatedly compete.
+	 */
+	wait_on_page_locked(page);
+	wait_on_page_writeback(page);
+	lock_page(page);
+	wait_on_page_writeback(page);
+
+	/*
+	 * Remove all references to entry.
+	 */
+	swcount = *swap_map;
+	if (swap_count(swcount) == SWAP_MAP_SHMEM) {
+		retval = shmem_unuse(entry, page);
+		VM_BUG_ON(retval > 0);
+		/* page has already been unlocked and released */
+		goto out;
+	}
+	if (swap_count(swcount) && *start_mm != &init_mm)
+		retval = unuse_mm(*start_mm, entry, page);
+
+	if (swap_count(*swap_map)) {
+		int set_start_mm = (*swap_map >= swcount);
+		struct list_head *p = &(*start_mm)->mmlist;
+		struct mm_struct *new_start_mm = *start_mm;
+		struct mm_struct *prev_mm = *start_mm;
+		struct mm_struct *mm;
+
+		atomic_inc(&new_start_mm->mm_users);
+		atomic_inc(&prev_mm->mm_users);
+		spin_lock(&mmlist_lock);
+		while (swap_count(*swap_map) && !retval &&
+				(p = p->next) != &(*start_mm)->mmlist) {
+			mm = list_entry(p, struct mm_struct, mmlist);
+			if (!atomic_inc_not_zero(&mm->mm_users))
+				continue;
+			spin_unlock(&mmlist_lock);
+			mmput(prev_mm);
+			prev_mm = mm;
+
+			cond_resched();
+
+			swcount = *swap_map;
+			if (!swap_count(swcount)) /* any usage ? */
+				;
+			else if (mm == &init_mm)
+				set_start_mm = 1;
+			else
+				retval = unuse_mm(mm, entry, page);
+
+			if (set_start_mm && *swap_map < swcount) {
+				mmput(new_start_mm);
+				atomic_inc(&mm->mm_users);
+				new_start_mm = mm;
+				set_start_mm = 0;
+			}
+			spin_lock(&mmlist_lock);
+		}
+		spin_unlock(&mmlist_lock);
+		mmput(prev_mm);
+		mmput(*start_mm);
+		*start_mm = new_start_mm;
+	}
+	if (retval) {
+		unlock_page(page);
+		page_cache_release(page);
+		goto out;
+	}
+
+	/*
+	 * If a reference remains (rare), we would like to leave
+	 * the page in the swap cache; but try_to_unmap could
+	 * then re-duplicate the entry once we drop page lock,
+	 * so we might loop indefinitely; also, that page could
+	 * not be swapped out to other storage meanwhile.  So:
+	 * delete from cache even if there's another reference,
+	 * after ensuring that the data has been saved to disk -
+	 * since if the reference remains (rarer), it will be
+	 * read from disk into another page.  Splitting into two
+	 * pages would be incorrect if swap supported "shared
+	 * private" pages, but they are handled by tmpfs files.
+	 *
+	 * Given how unuse_vma() targets one particular offset
+	 * in an anon_vma, once the anon_vma has been determined,
+	 * this splitting happens to be just what is needed to
+	 * handle where KSM pages have been swapped out: re-reading
+	 * is unnecessarily slow, but we can fix that later on.
+	 */
+	if (swap_count(*swap_map) &&
+	     PageDirty(page) && PageSwapCache(page)) {
+		struct writeback_control wbc = {
+			.sync_mode = WB_SYNC_NONE,
+		};
+
+		swap_writepage(page, &wbc);
+		lock_page(page);
+		wait_on_page_writeback(page);
+	}
+
+	/*
+	 * It is conceivable that a racing task removed this page from
+	 * swap cache just before we acquired the page lock at the top,
+	 * or while we dropped it in unuse_mm().  The page might even
+	 * be back in swap cache on another swap area: that we must not
+	 * delete, since it may not have been written out to swap yet.
+	 */
+	if (PageSwapCache(page) &&
+	    likely(page_private(page) == entry.val))
+		delete_from_swap_cache(page);
+
+	/*
+	 * So we could skip searching mms once swap count went
+	 * to 1, we did not mark any present ptes as dirty: must
+	 * mark page dirty so shrink_page_list will preserve it.
+	 */
+	SetPageDirty(page);
+	unlock_page(page);
+	page_cache_release(page);
+
+	/*
+	 * Make sure that we aren't completely killing
+	 * interactive performance.
+	 */
+	cond_resched();
+out:
+	return retval;
+}
+
+/*
  * We completely avoid races by reading each swap page in advance,
  * and then search for the process using it.  All the necessary
  * page table adjustments can then be made atomically.
@@ -1112,10 +1295,6 @@ int try_to_unuse(unsigned int type, bool frontswap,
 {
 	struct swap_info_struct *si = swap_info[type];
 	struct mm_struct *start_mm;
-	unsigned char *swap_map;
-	unsigned char swcount;
-	struct page *page;
-	swp_entry_t entry;
 	unsigned int i = 0;
 	int retval = 0;
 
@@ -1142,172 +1321,9 @@ int try_to_unuse(unsigned int type, bool frontswap,
 	 * there are races when an instance of an entry might be missed.
 	 */
 	while ((i = find_next_to_unuse(si, i, frontswap)) != 0) {
-		if (signal_pending(current)) {
-			retval = -EINTR;
-			break;
-		}
-
-		/*
-		 * Get a page for the entry, using the existing swap
-		 * cache page if there is one.  Otherwise, get a clean
-		 * page and read the swap into it.
-		 */
-		swap_map = &si->swap_map[i];
-		entry = swp_entry(type, i);
-		page = read_swap_cache_async(entry,
-					GFP_HIGHUSER_MOVABLE, NULL, 0);
-		if (!page) {
-			/*
-			 * Either swap_duplicate() failed because entry
-			 * has been freed independently, and will not be
-			 * reused since sys_swapoff() already disabled
-			 * allocation from here, or alloc_page() failed.
-			 */
-			if (!*swap_map)
-				continue;
-			retval = -ENOMEM;
-			break;
-		}
-
-		/*
-		 * Don't hold on to start_mm if it looks like exiting.
-		 */
-		if (atomic_read(&start_mm->mm_users) == 1) {
-			mmput(start_mm);
-			start_mm = &init_mm;
-			atomic_inc(&init_mm.mm_users);
-		}
-
-		/*
-		 * Wait for and lock page.  When do_swap_page races with
-		 * try_to_unuse, do_swap_page can handle the fault much
-		 * faster than try_to_unuse can locate the entry.  This
-		 * apparently redundant "wait_on_page_locked" lets try_to_unuse
-		 * defer to do_swap_page in such a case - in some tests,
-		 * do_swap_page and try_to_unuse repeatedly compete.
-		 */
-		wait_on_page_locked(page);
-		wait_on_page_writeback(page);
-		lock_page(page);
-		wait_on_page_writeback(page);
-
-		/*
-		 * Remove all references to entry.
-		 */
-		swcount = *swap_map;
-		if (swap_count(swcount) == SWAP_MAP_SHMEM) {
-			retval = shmem_unuse(entry, page);
-			/* page has already been unlocked and released */
-			if (retval < 0)
-				break;
-			continue;
-		}
-		if (swap_count(swcount) && start_mm != &init_mm)
-			retval = unuse_mm(start_mm, entry, page);
-
-		if (swap_count(*swap_map)) {
-			int set_start_mm = (*swap_map >= swcount);
-			struct list_head *p = &start_mm->mmlist;
-			struct mm_struct *new_start_mm = start_mm;
-			struct mm_struct *prev_mm = start_mm;
-			struct mm_struct *mm;
-
-			atomic_inc(&new_start_mm->mm_users);
-			atomic_inc(&prev_mm->mm_users);
-			spin_lock(&mmlist_lock);
-			while (swap_count(*swap_map) && !retval &&
-					(p = p->next) != &start_mm->mmlist) {
-				mm = list_entry(p, struct mm_struct, mmlist);
-				if (!atomic_inc_not_zero(&mm->mm_users))
-					continue;
-				spin_unlock(&mmlist_lock);
-				mmput(prev_mm);
-				prev_mm = mm;
-
-				cond_resched();
-
-				swcount = *swap_map;
-				if (!swap_count(swcount)) /* any usage ? */
-					;
-				else if (mm == &init_mm)
-					set_start_mm = 1;
-				else
-					retval = unuse_mm(mm, entry, page);
-
-				if (set_start_mm && *swap_map < swcount) {
-					mmput(new_start_mm);
-					atomic_inc(&mm->mm_users);
-					new_start_mm = mm;
-					set_start_mm = 0;
-				}
-				spin_lock(&mmlist_lock);
-			}
-			spin_unlock(&mmlist_lock);
-			mmput(prev_mm);
-			mmput(start_mm);
-			start_mm = new_start_mm;
-		}
-		if (retval) {
-			unlock_page(page);
-			page_cache_release(page);
+		if (try_to_unuse_swp_entry(&start_mm, si,
+					swp_entry(type, i)) != 0)
 			break;
-		}
-
-		/*
-		 * If a reference remains (rare), we would like to leave
-		 * the page in the swap cache; but try_to_unmap could
-		 * then re-duplicate the entry once we drop page lock,
-		 * so we might loop indefinitely; also, that page could
-		 * not be swapped out to other storage meanwhile.  So:
-		 * delete from cache even if there's another reference,
-		 * after ensuring that the data has been saved to disk -
-		 * since if the reference remains (rarer), it will be
-		 * read from disk into another page.  Splitting into two
-		 * pages would be incorrect if swap supported "shared
-		 * private" pages, but they are handled by tmpfs files.
-		 *
-		 * Given how unuse_vma() targets one particular offset
-		 * in an anon_vma, once the anon_vma has been determined,
-		 * this splitting happens to be just what is needed to
-		 * handle where KSM pages have been swapped out: re-reading
-		 * is unnecessarily slow, but we can fix that later on.
-		 */
-		if (swap_count(*swap_map) &&
-		     PageDirty(page) && PageSwapCache(page)) {
-			struct writeback_control wbc = {
-				.sync_mode = WB_SYNC_NONE,
-			};
-
-			swap_writepage(page, &wbc);
-			lock_page(page);
-			wait_on_page_writeback(page);
-		}
-
-		/*
-		 * It is conceivable that a racing task removed this page from
-		 * swap cache just before we acquired the page lock at the top,
-		 * or while we dropped it in unuse_mm().  The page might even
-		 * be back in swap cache on another swap area: that we must not
-		 * delete, since it may not have been written out to swap yet.
-		 */
-		if (PageSwapCache(page) &&
-		    likely(page_private(page) == entry.val))
-			delete_from_swap_cache(page);
-
-		/*
-		 * So we could skip searching mms once swap count went
-		 * to 1, we did not mark any present ptes as dirty: must
-		 * mark page dirty so shrink_page_list will preserve it.
-		 */
-		SetPageDirty(page);
-		unlock_page(page);
-		page_cache_release(page);
-
-		/*
-		 * Make sure that we aren't completely killing
-		 * interactive performance.
-		 */
-		cond_resched();
 		if (frontswap && pages_to_unuse > 0) {
 			if (!--pages_to_unuse)
 				break;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 3/4] mm: add zbud flag to page flags
  2013-08-06  6:42 ` Krzysztof Kozlowski
@ 2013-08-06  6:42   ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-06  6:42 UTC (permalink / raw)
  To: Seth Jennings, linux-mm, linux-kernel, Andrew Morton
  Cc: Mel Gorman, Bartlomiej Zolnierkiewicz, Marek Szyprowski,
	Kyungmin Park, Krzysztof Kozlowski

Add PageZbud flag to page flags to distinguish pages allocated in zbud.
Currently these pages do not have any flags set.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
---
 include/linux/page-flags.h |   12 ++++++++++++
 mm/page_alloc.c            |    3 +++
 mm/zbud.c                  |    4 ++++
 3 files changed, 19 insertions(+)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6d53675..5b8b61a6 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -109,6 +109,12 @@ enum pageflags {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	PG_compound_lock,
 #endif
+#ifdef CONFIG_ZBUD
+	/* Allocated by zbud. Flag is necessary to find zbud pages to unuse
+	 * during migration/compaction.
+	 */
+	PG_zbud,
+#endif
 	__NR_PAGEFLAGS,
 
 	/* Filesystems */
@@ -275,6 +281,12 @@ PAGEFLAG_FALSE(HWPoison)
 #define __PG_HWPOISON 0
 #endif
 
+#ifdef CONFIG_ZBUD
+PAGEFLAG(Zbud, zbud)
+#else
+PAGEFLAG_FALSE(Zbud)
+#endif
+
 u64 stable_page_flags(struct page *page);
 
 static inline int PageUptodate(struct page *page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b100255..1a120fb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6345,6 +6345,9 @@ static const struct trace_print_flags pageflag_names[] = {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	{1UL << PG_compound_lock,	"compound_lock"	},
 #endif
+#ifdef CONFIG_ZBUD
+	{1UL << PG_zbud,		"zbud"		},
+#endif
 };
 
 static void dump_page_flags(unsigned long flags)
diff --git a/mm/zbud.c b/mm/zbud.c
index a8e986f..a452949 100644
--- a/mm/zbud.c
+++ b/mm/zbud.c
@@ -230,7 +230,10 @@ static void get_zbud_page(struct zbud_header *zhdr)
 static int put_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr)
 {
 	struct page *page = virt_to_page(zhdr);
+	BUG_ON(!PageZbud(page));
+
 	if (put_page_testzero(page)) {
+		ClearPageZbud(page);
 		free_hot_cold_page(page, 0);
 		pool->pages_nr--;
 		return 1;
@@ -341,6 +344,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 	 * don't increase the page count.
 	 */
 	zhdr = init_zbud_page(page);
+	SetPageZbud(page);
 	bud = FIRST;
 
 found:
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 3/4] mm: add zbud flag to page flags
@ 2013-08-06  6:42   ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-06  6:42 UTC (permalink / raw)
  To: Seth Jennings, linux-mm, linux-kernel, Andrew Morton
  Cc: Mel Gorman, Bartlomiej Zolnierkiewicz, Marek Szyprowski,
	Kyungmin Park, Krzysztof Kozlowski

Add PageZbud flag to page flags to distinguish pages allocated in zbud.
Currently these pages do not have any flags set.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
---
 include/linux/page-flags.h |   12 ++++++++++++
 mm/page_alloc.c            |    3 +++
 mm/zbud.c                  |    4 ++++
 3 files changed, 19 insertions(+)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6d53675..5b8b61a6 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -109,6 +109,12 @@ enum pageflags {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	PG_compound_lock,
 #endif
+#ifdef CONFIG_ZBUD
+	/* Allocated by zbud. Flag is necessary to find zbud pages to unuse
+	 * during migration/compaction.
+	 */
+	PG_zbud,
+#endif
 	__NR_PAGEFLAGS,
 
 	/* Filesystems */
@@ -275,6 +281,12 @@ PAGEFLAG_FALSE(HWPoison)
 #define __PG_HWPOISON 0
 #endif
 
+#ifdef CONFIG_ZBUD
+PAGEFLAG(Zbud, zbud)
+#else
+PAGEFLAG_FALSE(Zbud)
+#endif
+
 u64 stable_page_flags(struct page *page);
 
 static inline int PageUptodate(struct page *page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b100255..1a120fb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6345,6 +6345,9 @@ static const struct trace_print_flags pageflag_names[] = {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	{1UL << PG_compound_lock,	"compound_lock"	},
 #endif
+#ifdef CONFIG_ZBUD
+	{1UL << PG_zbud,		"zbud"		},
+#endif
 };
 
 static void dump_page_flags(unsigned long flags)
diff --git a/mm/zbud.c b/mm/zbud.c
index a8e986f..a452949 100644
--- a/mm/zbud.c
+++ b/mm/zbud.c
@@ -230,7 +230,10 @@ static void get_zbud_page(struct zbud_header *zhdr)
 static int put_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr)
 {
 	struct page *page = virt_to_page(zhdr);
+	BUG_ON(!PageZbud(page));
+
 	if (put_page_testzero(page)) {
+		ClearPageZbud(page);
 		free_hot_cold_page(page, 0);
 		pool->pages_nr--;
 		return 1;
@@ -341,6 +344,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 	 * don't increase the page count.
 	 */
 	zhdr = init_zbud_page(page);
+	SetPageZbud(page);
 	bud = FIRST;
 
 found:
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 4/4] mm: reclaim zbud pages on migration and compaction
  2013-08-06  6:42 ` Krzysztof Kozlowski
@ 2013-08-06  6:42   ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-06  6:42 UTC (permalink / raw)
  To: Seth Jennings, linux-mm, linux-kernel, Andrew Morton
  Cc: Mel Gorman, Bartlomiej Zolnierkiewicz, Marek Szyprowski,
	Kyungmin Park, Krzysztof Kozlowski

Reclaim zbud pages during migration and compaction by unusing stored
data. This allows adding__GFP_RECLAIMABLE flag when allocating zbud
pages and effectively CMA pool can be used for zswap.

zbud pages are not movable and are not stored under any LRU (except
zbud's LRU). PageZbud flag is used in isolate_migratepages_range() to
grab zbud pages and pass them later for reclaim.

This reclaim process is different than zbud_reclaim_page(). It acts more
like swapoff() by trying to unuse pages stored in zbud page and bring
them back to memory. The standard zbud_reclaim_page() on the other hand
tries to write them back.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
---
 include/linux/zbud.h |   11 +++-
 mm/compaction.c      |   20 ++++++-
 mm/internal.h        |    1 +
 mm/page_alloc.c      |    6 ++
 mm/zbud.c            |  163 +++++++++++++++++++++++++++++++++++++++-----------
 mm/zswap.c           |   57 ++++++++++++++++--
 6 files changed, 215 insertions(+), 43 deletions(-)

diff --git a/include/linux/zbud.h b/include/linux/zbud.h
index 2571a5c..57ee85d 100644
--- a/include/linux/zbud.h
+++ b/include/linux/zbud.h
@@ -5,8 +5,14 @@
 
 struct zbud_pool;
 
+/**
+ * Template for functions called during reclaim.
+ */
+typedef int (*evict_page_t)(struct zbud_pool *pool, unsigned long handle);
+
 struct zbud_ops {
-	int (*evict)(struct zbud_pool *pool, unsigned long handle);
+	evict_page_t evict; /* callback for zbud_reclaim_lru_page() */
+	evict_page_t unuse; /* callback for zbud_reclaim_pages() */
 };
 
 struct zbud_pool *zbud_create_pool(gfp_t gfp, struct zbud_ops *ops);
@@ -14,7 +20,8 @@ void zbud_destroy_pool(struct zbud_pool *pool);
 int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 	unsigned long *handle);
 void zbud_free(struct zbud_pool *pool, unsigned long handle);
-int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries);
+int zbud_reclaim_lru_page(struct zbud_pool *pool, unsigned int retries);
+void zbud_reclaim_pages(struct list_head *zbud_pages);
 void *zbud_map(struct zbud_pool *pool, unsigned long handle);
 void zbud_unmap(struct zbud_pool *pool, unsigned long handle);
 u64 zbud_get_pool_size(struct zbud_pool *pool);
diff --git a/mm/compaction.c b/mm/compaction.c
index 05ccb4c..9bbf412 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -16,6 +16,7 @@
 #include <linux/sysfs.h>
 #include <linux/balloon_compaction.h>
 #include <linux/page-isolation.h>
+#include <linux/zbud.h>
 #include "internal.h"
 
 #ifdef CONFIG_COMPACTION
@@ -534,6 +535,17 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 			goto next_pageblock;
 		}
 
+		if (PageZbud(page)) {
+			/*
+			 * Zbud pages do not exist in LRU so we must
+			 * check for Zbud flag before PageLRU() below.
+			 */
+			BUG_ON(PageLRU(page));
+			get_page(page);
+			list_add(&page->lru, &cc->zbudpages);
+			continue;
+		}
+
 		/*
 		 * Check may be lockless but that's ok as we recheck later.
 		 * It's possible to migrate LRU pages and balloon pages
@@ -810,7 +822,10 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn, false);
 	if (!low_pfn || cc->contended)
 		return ISOLATE_ABORT;
-
+#ifdef CONFIG_ZBUD
+	if (!list_empty(&cc->zbudpages))
+		zbud_reclaim_pages(&cc->zbudpages);
+#endif
 	cc->migrate_pfn = low_pfn;
 
 	return ISOLATE_SUCCESS;
@@ -1023,11 +1038,13 @@ static unsigned long compact_zone_order(struct zone *zone,
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
+	INIT_LIST_HEAD(&cc.zbudpages);
 
 	ret = compact_zone(zone, &cc);
 
 	VM_BUG_ON(!list_empty(&cc.freepages));
 	VM_BUG_ON(!list_empty(&cc.migratepages));
+	VM_BUG_ON(!list_empty(&cc.zbudpages));
 
 	*contended = cc.contended;
 	return ret;
@@ -1105,6 +1122,7 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
 		cc->zone = zone;
 		INIT_LIST_HEAD(&cc->freepages);
 		INIT_LIST_HEAD(&cc->migratepages);
+		INIT_LIST_HEAD(&cc->zbudpages);
 
 		if (cc->order == -1 || !compaction_deferred(zone, cc->order))
 			compact_zone(zone, cc);
diff --git a/mm/internal.h b/mm/internal.h
index 4390ac6..eaf5c884 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -119,6 +119,7 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
+	struct list_head zbudpages;	/* List of pages belonging to zbud */
 	bool sync;			/* Synchronous migration */
 	bool ignore_skip_hint;		/* Scan blocks even if marked skip */
 	bool finished_update_free;	/* True when the zone cached pfns are
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1a120fb..e482876 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -60,6 +60,7 @@
 #include <linux/page-debug-flags.h>
 #include <linux/hugetlb.h>
 #include <linux/sched/rt.h>
+#include <linux/zbud.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
@@ -6031,6 +6032,10 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 				ret = -EINTR;
 				break;
 			}
+#ifdef CONFIG_ZBUD
+			if (!list_empty(&cc.zbudpages))
+				zbud_reclaim_pages(&cc.zbudpages);
+#endif
 			tries = 0;
 		} else if (++tries == 5) {
 			ret = ret < 0 ? ret : -EBUSY;
@@ -6085,6 +6090,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		.ignore_skip_hint = true,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
+	INIT_LIST_HEAD(&cc.zbudpages);
 
 	/*
 	 * What we do here is we mark all pageblocks in range as
diff --git a/mm/zbud.c b/mm/zbud.c
index a452949..98a04c8 100644
--- a/mm/zbud.c
+++ b/mm/zbud.c
@@ -103,12 +103,14 @@ struct zbud_pool {
  * @lru:	links the zbud page into the lru list in the pool
  * @first_chunks:	the size of the first buddy in chunks, 0 if free
  * @last_chunks:	the size of the last buddy in chunks, 0 if free
+ * @pool:		pool to which this zbud page belongs to
  */
 struct zbud_header {
 	struct list_head buddy;
 	struct list_head lru;
 	unsigned int first_chunks;
 	unsigned int last_chunks;
+	struct zbud_pool *pool;
 };
 
 /*****************
@@ -137,6 +139,7 @@ static struct zbud_header *init_zbud_page(struct page *page)
 	zhdr->last_chunks = 0;
 	INIT_LIST_HEAD(&zhdr->buddy);
 	INIT_LIST_HEAD(&zhdr->lru);
+	zhdr->pool = NULL;
 	return zhdr;
 }
 
@@ -241,7 +244,6 @@ static int put_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr)
 	return 0;
 }
 
-
 /*****************
  * API Functions
 *****************/
@@ -345,6 +347,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 	 */
 	zhdr = init_zbud_page(page);
 	SetPageZbud(page);
+	zhdr->pool = pool;
 	bud = FIRST;
 
 found:
@@ -394,8 +397,57 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
 #define list_tail_entry(ptr, type, member) \
 	list_entry((ptr)->prev, type, member)
 
+/*
+ * Pool lock must be held when calling this function and at least
+ * one handle must not free.
+ * On return the pool lock will be still held however during the
+ * execution it will be unlocked and locked for the time of calling
+ * the evict callback.
+ *
+ * Returns 1 if page was freed here, 0 otherwise (still in use)
+ */
+static int do_reclaim(struct zbud_pool *pool, struct zbud_header *zhdr,
+				evict_page_t evict_cb)
+{
+	int ret;
+	unsigned long first_handle = 0, last_handle = 0;
+
+	BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0);
+	/* Move this last element to beginning of LRU */
+	list_del(&zhdr->lru);
+	list_add(&zhdr->lru, &pool->lru);
+	/* Protect zbud page against free */
+	get_zbud_page(zhdr);
+	/*
+	 * We need encode the handles before unlocking, since we can
+	 * race with free that will set (first|last)_chunks to 0
+	 */
+	first_handle = 0;
+	last_handle = 0;
+	if (zhdr->first_chunks)
+		first_handle = encode_handle(zhdr, FIRST);
+	if (zhdr->last_chunks)
+		last_handle = encode_handle(zhdr, LAST);
+	spin_unlock(&pool->lock);
+
+	/* Issue the eviction callback(s) */
+	if (first_handle) {
+		ret = evict_cb(pool, first_handle);
+		if (ret)
+			goto next;
+	}
+	if (last_handle) {
+		ret = evict_cb(pool, last_handle);
+		if (ret)
+			goto next;
+	}
+next:
+	spin_lock(&pool->lock);
+	return put_zbud_page(pool, zhdr);
+}
+
 /**
- * zbud_reclaim_page() - evicts allocations from a pool page and frees it
+ * zbud_reclaim_lru_page() - evicts allocations from a pool page and frees it
  * @pool:	pool from which a page will attempt to be evicted
  * @retires:	number of pages on the LRU list for which eviction will
  *		be attempted before failing
@@ -429,11 +481,10 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
  * no pages to evict or an eviction handler is not registered, -EAGAIN if
  * the retry limit was hit.
  */
-int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
+int zbud_reclaim_lru_page(struct zbud_pool *pool, unsigned int retries)
 {
-	int i, ret;
+	int i;
 	struct zbud_header *zhdr;
-	unsigned long first_handle = 0, last_handle = 0;
 
 	spin_lock(&pool->lock);
 	if (!pool->ops || !pool->ops->evict || list_empty(&pool->lru) ||
@@ -454,44 +505,84 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
 			return 0;
 		}
 		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
-		BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0);
-		/* Move this last element to beginning of LRU */
-		list_del(&zhdr->lru);
-		list_add(&zhdr->lru, &pool->lru);
-		/* Protect zbud page against free */
-		get_zbud_page(zhdr);
-		/*
-		 * We need encode the handles before unlocking, since we can
-		 * race with free that will set (first|last)_chunks to 0
-		 */
-		first_handle = 0;
-		last_handle = 0;
-		if (zhdr->first_chunks)
-			first_handle = encode_handle(zhdr, FIRST);
-		if (zhdr->last_chunks)
-			last_handle = encode_handle(zhdr, LAST);
-		spin_unlock(&pool->lock);
-
-		/* Issue the eviction callback(s) */
-		if (first_handle) {
-			ret = pool->ops->evict(pool, first_handle);
-			if (ret)
-				goto next;
+		if (do_reclaim(pool, zhdr, pool->ops->evict)) {
+			spin_unlock(&pool->lock);
+			return 0;
 		}
-		if (last_handle) {
-			ret = pool->ops->evict(pool, last_handle);
-			if (ret)
-				goto next;
+	}
+	spin_unlock(&pool->lock);
+	return -EAGAIN;
+}
+
+
+/**
+ * zbud_reclaim_pages() - reclaims zbud pages by unusing stored pages
+ * @zbud_pages		list of zbud pages to reclaim
+ *
+ * zbud reclaim is different from normal system reclaim in that the reclaim is
+ * done from the bottom, up.  This is because only the bottom layer, zbud, has
+ * information on how the allocations are organized within each zbud page. This
+ * has the potential to create interesting locking situations between zbud and
+ * the user, however.
+ *
+ * To avoid these, this is how zbud_reclaim_pages() should be called:
+
+ * The user detects some pages should be reclaimed and calls
+ * zbud_reclaim_pages(). The zbud_reclaim_pages() will remove zbud
+ * pages from the pool LRU list and call the user-defined unuse handler with
+ * the pool and handle as arguments.
+ *
+ * If the handle can not be unused, the unuse handler should return
+ * non-zero. zbud_reclaim_pages() will add the zbud page back to the
+ * appropriate list and try the next zbud page on the list.
+ *
+ * If the handle is successfully unused, the unuse handler should
+ * return 0.
+ * The zbud page will be freed later by unuse code
+ * (e.g. frontswap_invalidate_page()).
+ *
+ * If all buddies in the zbud page are successfully unused, then the
+ * zbud page can be freed.
+ */
+void zbud_reclaim_pages(struct list_head *zbud_pages)
+{
+	struct page *page;
+	struct page *page2;
+
+	list_for_each_entry_safe(page, page2, zbud_pages, lru) {
+		struct zbud_header *zhdr;
+		struct zbud_pool *pool;
+
+		list_del(&page->lru);
+		if (!PageZbud(page)) {
+			/*
+			 * Drop page count from isolate_migratepages_range()
+			 */
+			put_page(page);
+			continue;
 		}
-next:
+		zhdr = page_address(page);
+		BUG_ON(!zhdr->pool);
+		pool = zhdr->pool;
+
 		spin_lock(&pool->lock);
+		/* Drop page count from isolate_migratepages_range() */
 		if (put_zbud_page(pool, zhdr)) {
+			/*
+			 * zbud_free() could free the handles before acquiring
+			 * pool lock above. No need to reclaim.
+			 */
 			spin_unlock(&pool->lock);
-			return 0;
+			continue;
+		}
+		if (!pool->ops || !pool->ops->unuse || list_empty(&pool->lru)) {
+			spin_unlock(&pool->lock);
+			continue;
 		}
+		BUG_ON(!PageZbud(page));
+		do_reclaim(pool, zhdr, pool->ops->unuse);
+		spin_unlock(&pool->lock);
 	}
-	spin_unlock(&pool->lock);
-	return -EAGAIN;
 }
 
 /**
diff --git a/mm/zswap.c b/mm/zswap.c
index deda2b6..846649b 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -35,6 +35,9 @@
 #include <linux/crypto.h>
 #include <linux/mempool.h>
 #include <linux/zbud.h>
+#include <linux/swapfile.h>
+#include <linux/mman.h>
+#include <linux/security.h>
 
 #include <linux/mm_types.h>
 #include <linux/page-flags.h>
@@ -61,6 +64,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
 static u64 zswap_pool_limit_hit;
 /* Pages written back when pool limit was reached */
 static u64 zswap_written_back_pages;
+/* Pages unused due to reclaim */
+static u64 zswap_unused_pages;
 /* Store failed due to a reclaim failure after pool limit was reached */
 static u64 zswap_reject_reclaim_fail;
 /* Compressed page was too big for the allocator to (optimally) store */
@@ -596,6 +601,47 @@ fail:
 	return ret;
 }
 
+/**
+ * Tries to unuse swap entries by uncompressing them.
+ * Function is a stripped swapfile.c::try_to_unuse().
+ *
+ * Returns 0 on success or negative on error.
+ */
+static int zswap_unuse_entry(struct zbud_pool *pool, unsigned long handle)
+{
+	struct zswap_header *zhdr;
+	swp_entry_t swpentry;
+	struct zswap_tree *tree;
+	pgoff_t offset;
+	struct mm_struct *start_mm;
+	struct swap_info_struct *si;
+	int ret;
+
+	/* extract swpentry from data */
+	zhdr = zbud_map(pool, handle);
+	swpentry = zhdr->swpentry; /* here */
+	zbud_unmap(pool, handle);
+	tree = zswap_trees[swp_type(swpentry)];
+	offset = swp_offset(swpentry);
+	BUG_ON(pool != tree->pool);
+
+	/*
+	 * We cannot hold swap_lock here but swap_info may
+	 * change (e.g. by swapoff). In case of swapoff
+	 * check for SWP_WRITEOK.
+	 */
+	si = swap_info[swp_type(swpentry)];
+	if (!(si->flags & SWP_WRITEOK))
+		return -ECANCELED;
+
+	start_mm = &init_mm;
+	atomic_inc(&init_mm.mm_users);
+	ret = try_to_unuse_swp_entry(&start_mm, si, swpentry);
+	mmput(start_mm);
+	zswap_unused_pages++;
+	return ret;
+}
+
 /*********************************
 * frontswap hooks
 **********************************/
@@ -620,7 +666,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
 	/* reclaim space if needed */
 	if (zswap_is_full()) {
 		zswap_pool_limit_hit++;
-		if (zbud_reclaim_page(tree->pool, 8)) {
+		if (zbud_reclaim_lru_page(tree->pool, 8)) {
 			zswap_reject_reclaim_fail++;
 			ret = -ENOMEM;
 			goto reject;
@@ -647,8 +693,8 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
 
 	/* store */
 	len = dlen + sizeof(struct zswap_header);
-	ret = zbud_alloc(tree->pool, len, __GFP_NORETRY | __GFP_NOWARN,
-		&handle);
+	ret = zbud_alloc(tree->pool, len, __GFP_NORETRY | __GFP_NOWARN |
+						__GFP_RECLAIMABLE, &handle);
 	if (ret == -ENOSPC) {
 		zswap_reject_compress_poor++;
 		goto freepage;
@@ -819,7 +865,8 @@ static void zswap_frontswap_invalidate_area(unsigned type)
 }
 
 static struct zbud_ops zswap_zbud_ops = {
-	.evict = zswap_writeback_entry
+	.evict = zswap_writeback_entry,
+	.unuse = zswap_unuse_entry
 };
 
 static void zswap_frontswap_init(unsigned type)
@@ -880,6 +927,8 @@ static int __init zswap_debugfs_init(void)
 			zswap_debugfs_root, &zswap_reject_compress_poor);
 	debugfs_create_u64("written_back_pages", S_IRUGO,
 			zswap_debugfs_root, &zswap_written_back_pages);
+	debugfs_create_u64("unused_pages", S_IRUGO,
+			zswap_debugfs_root, &zswap_unused_pages);
 	debugfs_create_u64("duplicate_entry", S_IRUGO,
 			zswap_debugfs_root, &zswap_duplicate_entry);
 	debugfs_create_u64("pool_pages", S_IRUGO,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 4/4] mm: reclaim zbud pages on migration and compaction
@ 2013-08-06  6:42   ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-06  6:42 UTC (permalink / raw)
  To: Seth Jennings, linux-mm, linux-kernel, Andrew Morton
  Cc: Mel Gorman, Bartlomiej Zolnierkiewicz, Marek Szyprowski,
	Kyungmin Park, Krzysztof Kozlowski

Reclaim zbud pages during migration and compaction by unusing stored
data. This allows adding__GFP_RECLAIMABLE flag when allocating zbud
pages and effectively CMA pool can be used for zswap.

zbud pages are not movable and are not stored under any LRU (except
zbud's LRU). PageZbud flag is used in isolate_migratepages_range() to
grab zbud pages and pass them later for reclaim.

This reclaim process is different than zbud_reclaim_page(). It acts more
like swapoff() by trying to unuse pages stored in zbud page and bring
them back to memory. The standard zbud_reclaim_page() on the other hand
tries to write them back.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
---
 include/linux/zbud.h |   11 +++-
 mm/compaction.c      |   20 ++++++-
 mm/internal.h        |    1 +
 mm/page_alloc.c      |    6 ++
 mm/zbud.c            |  163 +++++++++++++++++++++++++++++++++++++++-----------
 mm/zswap.c           |   57 ++++++++++++++++--
 6 files changed, 215 insertions(+), 43 deletions(-)

diff --git a/include/linux/zbud.h b/include/linux/zbud.h
index 2571a5c..57ee85d 100644
--- a/include/linux/zbud.h
+++ b/include/linux/zbud.h
@@ -5,8 +5,14 @@
 
 struct zbud_pool;
 
+/**
+ * Template for functions called during reclaim.
+ */
+typedef int (*evict_page_t)(struct zbud_pool *pool, unsigned long handle);
+
 struct zbud_ops {
-	int (*evict)(struct zbud_pool *pool, unsigned long handle);
+	evict_page_t evict; /* callback for zbud_reclaim_lru_page() */
+	evict_page_t unuse; /* callback for zbud_reclaim_pages() */
 };
 
 struct zbud_pool *zbud_create_pool(gfp_t gfp, struct zbud_ops *ops);
@@ -14,7 +20,8 @@ void zbud_destroy_pool(struct zbud_pool *pool);
 int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 	unsigned long *handle);
 void zbud_free(struct zbud_pool *pool, unsigned long handle);
-int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries);
+int zbud_reclaim_lru_page(struct zbud_pool *pool, unsigned int retries);
+void zbud_reclaim_pages(struct list_head *zbud_pages);
 void *zbud_map(struct zbud_pool *pool, unsigned long handle);
 void zbud_unmap(struct zbud_pool *pool, unsigned long handle);
 u64 zbud_get_pool_size(struct zbud_pool *pool);
diff --git a/mm/compaction.c b/mm/compaction.c
index 05ccb4c..9bbf412 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -16,6 +16,7 @@
 #include <linux/sysfs.h>
 #include <linux/balloon_compaction.h>
 #include <linux/page-isolation.h>
+#include <linux/zbud.h>
 #include "internal.h"
 
 #ifdef CONFIG_COMPACTION
@@ -534,6 +535,17 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 			goto next_pageblock;
 		}
 
+		if (PageZbud(page)) {
+			/*
+			 * Zbud pages do not exist in LRU so we must
+			 * check for Zbud flag before PageLRU() below.
+			 */
+			BUG_ON(PageLRU(page));
+			get_page(page);
+			list_add(&page->lru, &cc->zbudpages);
+			continue;
+		}
+
 		/*
 		 * Check may be lockless but that's ok as we recheck later.
 		 * It's possible to migrate LRU pages and balloon pages
@@ -810,7 +822,10 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn, false);
 	if (!low_pfn || cc->contended)
 		return ISOLATE_ABORT;
-
+#ifdef CONFIG_ZBUD
+	if (!list_empty(&cc->zbudpages))
+		zbud_reclaim_pages(&cc->zbudpages);
+#endif
 	cc->migrate_pfn = low_pfn;
 
 	return ISOLATE_SUCCESS;
@@ -1023,11 +1038,13 @@ static unsigned long compact_zone_order(struct zone *zone,
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
+	INIT_LIST_HEAD(&cc.zbudpages);
 
 	ret = compact_zone(zone, &cc);
 
 	VM_BUG_ON(!list_empty(&cc.freepages));
 	VM_BUG_ON(!list_empty(&cc.migratepages));
+	VM_BUG_ON(!list_empty(&cc.zbudpages));
 
 	*contended = cc.contended;
 	return ret;
@@ -1105,6 +1122,7 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
 		cc->zone = zone;
 		INIT_LIST_HEAD(&cc->freepages);
 		INIT_LIST_HEAD(&cc->migratepages);
+		INIT_LIST_HEAD(&cc->zbudpages);
 
 		if (cc->order == -1 || !compaction_deferred(zone, cc->order))
 			compact_zone(zone, cc);
diff --git a/mm/internal.h b/mm/internal.h
index 4390ac6..eaf5c884 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -119,6 +119,7 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
+	struct list_head zbudpages;	/* List of pages belonging to zbud */
 	bool sync;			/* Synchronous migration */
 	bool ignore_skip_hint;		/* Scan blocks even if marked skip */
 	bool finished_update_free;	/* True when the zone cached pfns are
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1a120fb..e482876 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -60,6 +60,7 @@
 #include <linux/page-debug-flags.h>
 #include <linux/hugetlb.h>
 #include <linux/sched/rt.h>
+#include <linux/zbud.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
@@ -6031,6 +6032,10 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 				ret = -EINTR;
 				break;
 			}
+#ifdef CONFIG_ZBUD
+			if (!list_empty(&cc.zbudpages))
+				zbud_reclaim_pages(&cc.zbudpages);
+#endif
 			tries = 0;
 		} else if (++tries == 5) {
 			ret = ret < 0 ? ret : -EBUSY;
@@ -6085,6 +6090,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		.ignore_skip_hint = true,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
+	INIT_LIST_HEAD(&cc.zbudpages);
 
 	/*
 	 * What we do here is we mark all pageblocks in range as
diff --git a/mm/zbud.c b/mm/zbud.c
index a452949..98a04c8 100644
--- a/mm/zbud.c
+++ b/mm/zbud.c
@@ -103,12 +103,14 @@ struct zbud_pool {
  * @lru:	links the zbud page into the lru list in the pool
  * @first_chunks:	the size of the first buddy in chunks, 0 if free
  * @last_chunks:	the size of the last buddy in chunks, 0 if free
+ * @pool:		pool to which this zbud page belongs to
  */
 struct zbud_header {
 	struct list_head buddy;
 	struct list_head lru;
 	unsigned int first_chunks;
 	unsigned int last_chunks;
+	struct zbud_pool *pool;
 };
 
 /*****************
@@ -137,6 +139,7 @@ static struct zbud_header *init_zbud_page(struct page *page)
 	zhdr->last_chunks = 0;
 	INIT_LIST_HEAD(&zhdr->buddy);
 	INIT_LIST_HEAD(&zhdr->lru);
+	zhdr->pool = NULL;
 	return zhdr;
 }
 
@@ -241,7 +244,6 @@ static int put_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr)
 	return 0;
 }
 
-
 /*****************
  * API Functions
 *****************/
@@ -345,6 +347,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 	 */
 	zhdr = init_zbud_page(page);
 	SetPageZbud(page);
+	zhdr->pool = pool;
 	bud = FIRST;
 
 found:
@@ -394,8 +397,57 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
 #define list_tail_entry(ptr, type, member) \
 	list_entry((ptr)->prev, type, member)
 
+/*
+ * Pool lock must be held when calling this function and at least
+ * one handle must not free.
+ * On return the pool lock will be still held however during the
+ * execution it will be unlocked and locked for the time of calling
+ * the evict callback.
+ *
+ * Returns 1 if page was freed here, 0 otherwise (still in use)
+ */
+static int do_reclaim(struct zbud_pool *pool, struct zbud_header *zhdr,
+				evict_page_t evict_cb)
+{
+	int ret;
+	unsigned long first_handle = 0, last_handle = 0;
+
+	BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0);
+	/* Move this last element to beginning of LRU */
+	list_del(&zhdr->lru);
+	list_add(&zhdr->lru, &pool->lru);
+	/* Protect zbud page against free */
+	get_zbud_page(zhdr);
+	/*
+	 * We need encode the handles before unlocking, since we can
+	 * race with free that will set (first|last)_chunks to 0
+	 */
+	first_handle = 0;
+	last_handle = 0;
+	if (zhdr->first_chunks)
+		first_handle = encode_handle(zhdr, FIRST);
+	if (zhdr->last_chunks)
+		last_handle = encode_handle(zhdr, LAST);
+	spin_unlock(&pool->lock);
+
+	/* Issue the eviction callback(s) */
+	if (first_handle) {
+		ret = evict_cb(pool, first_handle);
+		if (ret)
+			goto next;
+	}
+	if (last_handle) {
+		ret = evict_cb(pool, last_handle);
+		if (ret)
+			goto next;
+	}
+next:
+	spin_lock(&pool->lock);
+	return put_zbud_page(pool, zhdr);
+}
+
 /**
- * zbud_reclaim_page() - evicts allocations from a pool page and frees it
+ * zbud_reclaim_lru_page() - evicts allocations from a pool page and frees it
  * @pool:	pool from which a page will attempt to be evicted
  * @retires:	number of pages on the LRU list for which eviction will
  *		be attempted before failing
@@ -429,11 +481,10 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
  * no pages to evict or an eviction handler is not registered, -EAGAIN if
  * the retry limit was hit.
  */
-int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
+int zbud_reclaim_lru_page(struct zbud_pool *pool, unsigned int retries)
 {
-	int i, ret;
+	int i;
 	struct zbud_header *zhdr;
-	unsigned long first_handle = 0, last_handle = 0;
 
 	spin_lock(&pool->lock);
 	if (!pool->ops || !pool->ops->evict || list_empty(&pool->lru) ||
@@ -454,44 +505,84 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
 			return 0;
 		}
 		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
-		BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0);
-		/* Move this last element to beginning of LRU */
-		list_del(&zhdr->lru);
-		list_add(&zhdr->lru, &pool->lru);
-		/* Protect zbud page against free */
-		get_zbud_page(zhdr);
-		/*
-		 * We need encode the handles before unlocking, since we can
-		 * race with free that will set (first|last)_chunks to 0
-		 */
-		first_handle = 0;
-		last_handle = 0;
-		if (zhdr->first_chunks)
-			first_handle = encode_handle(zhdr, FIRST);
-		if (zhdr->last_chunks)
-			last_handle = encode_handle(zhdr, LAST);
-		spin_unlock(&pool->lock);
-
-		/* Issue the eviction callback(s) */
-		if (first_handle) {
-			ret = pool->ops->evict(pool, first_handle);
-			if (ret)
-				goto next;
+		if (do_reclaim(pool, zhdr, pool->ops->evict)) {
+			spin_unlock(&pool->lock);
+			return 0;
 		}
-		if (last_handle) {
-			ret = pool->ops->evict(pool, last_handle);
-			if (ret)
-				goto next;
+	}
+	spin_unlock(&pool->lock);
+	return -EAGAIN;
+}
+
+
+/**
+ * zbud_reclaim_pages() - reclaims zbud pages by unusing stored pages
+ * @zbud_pages		list of zbud pages to reclaim
+ *
+ * zbud reclaim is different from normal system reclaim in that the reclaim is
+ * done from the bottom, up.  This is because only the bottom layer, zbud, has
+ * information on how the allocations are organized within each zbud page. This
+ * has the potential to create interesting locking situations between zbud and
+ * the user, however.
+ *
+ * To avoid these, this is how zbud_reclaim_pages() should be called:
+
+ * The user detects some pages should be reclaimed and calls
+ * zbud_reclaim_pages(). The zbud_reclaim_pages() will remove zbud
+ * pages from the pool LRU list and call the user-defined unuse handler with
+ * the pool and handle as arguments.
+ *
+ * If the handle can not be unused, the unuse handler should return
+ * non-zero. zbud_reclaim_pages() will add the zbud page back to the
+ * appropriate list and try the next zbud page on the list.
+ *
+ * If the handle is successfully unused, the unuse handler should
+ * return 0.
+ * The zbud page will be freed later by unuse code
+ * (e.g. frontswap_invalidate_page()).
+ *
+ * If all buddies in the zbud page are successfully unused, then the
+ * zbud page can be freed.
+ */
+void zbud_reclaim_pages(struct list_head *zbud_pages)
+{
+	struct page *page;
+	struct page *page2;
+
+	list_for_each_entry_safe(page, page2, zbud_pages, lru) {
+		struct zbud_header *zhdr;
+		struct zbud_pool *pool;
+
+		list_del(&page->lru);
+		if (!PageZbud(page)) {
+			/*
+			 * Drop page count from isolate_migratepages_range()
+			 */
+			put_page(page);
+			continue;
 		}
-next:
+		zhdr = page_address(page);
+		BUG_ON(!zhdr->pool);
+		pool = zhdr->pool;
+
 		spin_lock(&pool->lock);
+		/* Drop page count from isolate_migratepages_range() */
 		if (put_zbud_page(pool, zhdr)) {
+			/*
+			 * zbud_free() could free the handles before acquiring
+			 * pool lock above. No need to reclaim.
+			 */
 			spin_unlock(&pool->lock);
-			return 0;
+			continue;
+		}
+		if (!pool->ops || !pool->ops->unuse || list_empty(&pool->lru)) {
+			spin_unlock(&pool->lock);
+			continue;
 		}
+		BUG_ON(!PageZbud(page));
+		do_reclaim(pool, zhdr, pool->ops->unuse);
+		spin_unlock(&pool->lock);
 	}
-	spin_unlock(&pool->lock);
-	return -EAGAIN;
 }
 
 /**
diff --git a/mm/zswap.c b/mm/zswap.c
index deda2b6..846649b 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -35,6 +35,9 @@
 #include <linux/crypto.h>
 #include <linux/mempool.h>
 #include <linux/zbud.h>
+#include <linux/swapfile.h>
+#include <linux/mman.h>
+#include <linux/security.h>
 
 #include <linux/mm_types.h>
 #include <linux/page-flags.h>
@@ -61,6 +64,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
 static u64 zswap_pool_limit_hit;
 /* Pages written back when pool limit was reached */
 static u64 zswap_written_back_pages;
+/* Pages unused due to reclaim */
+static u64 zswap_unused_pages;
 /* Store failed due to a reclaim failure after pool limit was reached */
 static u64 zswap_reject_reclaim_fail;
 /* Compressed page was too big for the allocator to (optimally) store */
@@ -596,6 +601,47 @@ fail:
 	return ret;
 }
 
+/**
+ * Tries to unuse swap entries by uncompressing them.
+ * Function is a stripped swapfile.c::try_to_unuse().
+ *
+ * Returns 0 on success or negative on error.
+ */
+static int zswap_unuse_entry(struct zbud_pool *pool, unsigned long handle)
+{
+	struct zswap_header *zhdr;
+	swp_entry_t swpentry;
+	struct zswap_tree *tree;
+	pgoff_t offset;
+	struct mm_struct *start_mm;
+	struct swap_info_struct *si;
+	int ret;
+
+	/* extract swpentry from data */
+	zhdr = zbud_map(pool, handle);
+	swpentry = zhdr->swpentry; /* here */
+	zbud_unmap(pool, handle);
+	tree = zswap_trees[swp_type(swpentry)];
+	offset = swp_offset(swpentry);
+	BUG_ON(pool != tree->pool);
+
+	/*
+	 * We cannot hold swap_lock here but swap_info may
+	 * change (e.g. by swapoff). In case of swapoff
+	 * check for SWP_WRITEOK.
+	 */
+	si = swap_info[swp_type(swpentry)];
+	if (!(si->flags & SWP_WRITEOK))
+		return -ECANCELED;
+
+	start_mm = &init_mm;
+	atomic_inc(&init_mm.mm_users);
+	ret = try_to_unuse_swp_entry(&start_mm, si, swpentry);
+	mmput(start_mm);
+	zswap_unused_pages++;
+	return ret;
+}
+
 /*********************************
 * frontswap hooks
 **********************************/
@@ -620,7 +666,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
 	/* reclaim space if needed */
 	if (zswap_is_full()) {
 		zswap_pool_limit_hit++;
-		if (zbud_reclaim_page(tree->pool, 8)) {
+		if (zbud_reclaim_lru_page(tree->pool, 8)) {
 			zswap_reject_reclaim_fail++;
 			ret = -ENOMEM;
 			goto reject;
@@ -647,8 +693,8 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
 
 	/* store */
 	len = dlen + sizeof(struct zswap_header);
-	ret = zbud_alloc(tree->pool, len, __GFP_NORETRY | __GFP_NOWARN,
-		&handle);
+	ret = zbud_alloc(tree->pool, len, __GFP_NORETRY | __GFP_NOWARN |
+						__GFP_RECLAIMABLE, &handle);
 	if (ret == -ENOSPC) {
 		zswap_reject_compress_poor++;
 		goto freepage;
@@ -819,7 +865,8 @@ static void zswap_frontswap_invalidate_area(unsigned type)
 }
 
 static struct zbud_ops zswap_zbud_ops = {
-	.evict = zswap_writeback_entry
+	.evict = zswap_writeback_entry,
+	.unuse = zswap_unuse_entry
 };
 
 static void zswap_frontswap_init(unsigned type)
@@ -880,6 +927,8 @@ static int __init zswap_debugfs_init(void)
 			zswap_debugfs_root, &zswap_reject_compress_poor);
 	debugfs_create_u64("written_back_pages", S_IRUGO,
 			zswap_debugfs_root, &zswap_written_back_pages);
+	debugfs_create_u64("unused_pages", S_IRUGO,
+			zswap_debugfs_root, &zswap_unused_pages);
 	debugfs_create_u64("duplicate_entry", S_IRUGO,
 			zswap_debugfs_root, &zswap_duplicate_entry);
 	debugfs_create_u64("pool_pages", S_IRUGO,
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
  2013-08-06  6:42   ` Krzysztof Kozlowski
@ 2013-08-06  9:00     ` Bob Liu
  -1 siblings, 0 replies; 38+ messages in thread
From: Bob Liu @ 2013-08-06  9:00 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park,
	Tomasz Stanislawski

Hi Krzysztof,

On 08/06/2013 02:42 PM, Krzysztof Kozlowski wrote:
> Use page reference counter for zbud pages. The ref counter replaces
> zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> when zbud_free() is called during reclaim. It allows implementation of
> additional reclaim paths.
> 
> The page count is incremented when:
>  - a handle is created and passed to zswap (in zbud_alloc()),
>  - user-supplied eviction callback is called (in zbud_reclaim_page()).
> 
> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>

Looks good to me.
Reviewed-by: Bob Liu <bob.liu@oracle.com>

> ---
>  mm/zbud.c |  150 +++++++++++++++++++++++++++++++++++--------------------------
>  1 file changed, 86 insertions(+), 64 deletions(-)
> 
> diff --git a/mm/zbud.c b/mm/zbud.c
> index ad1e781..a8e986f 100644
> --- a/mm/zbud.c
> +++ b/mm/zbud.c
> @@ -109,7 +109,6 @@ struct zbud_header {
>  	struct list_head lru;
>  	unsigned int first_chunks;
>  	unsigned int last_chunks;
> -	bool under_reclaim;
>  };
>  
>  /*****************
> @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
>  	zhdr->last_chunks = 0;
>  	INIT_LIST_HEAD(&zhdr->buddy);
>  	INIT_LIST_HEAD(&zhdr->lru);
> -	zhdr->under_reclaim = 0;
>  	return zhdr;
>  }
>  
> -/* Resets the struct page fields and frees the page */
> -static void free_zbud_page(struct zbud_header *zhdr)
> -{
> -	__free_page(virt_to_page(zhdr));
> -}
> -
>  /*
>   * Encodes the handle of a particular buddy within a zbud page
>   * Pool lock should be held as this function accesses first|last_chunks
> @@ -188,6 +180,65 @@ static int num_free_chunks(struct zbud_header *zhdr)
>  	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
>  }
>  
> +/*
> + * Called after zbud_free() or zbud_alloc().
> + * Checks whether given zbud page has to be:
> + *  - removed from buddied/unbuddied/LRU lists completetely (zbud_free).
> + *  - moved from buddied to unbuddied list
> + *    and to beginning of LRU (zbud_alloc, zbud_free),
> + *  - added to buddied list and LRU (zbud_alloc),
> + *
> + * The page must be already removed from buddied/unbuddied lists.
> + * Must be called under pool->lock.
> + */
> +static void rebalance_lists(struct zbud_pool *pool, struct zbud_header *zhdr)
> +{

Nit picker, how about change the name to adjust_lists() or something
like this because we don't do any rebalancing.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
@ 2013-08-06  9:00     ` Bob Liu
  0 siblings, 0 replies; 38+ messages in thread
From: Bob Liu @ 2013-08-06  9:00 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park,
	Tomasz Stanislawski

Hi Krzysztof,

On 08/06/2013 02:42 PM, Krzysztof Kozlowski wrote:
> Use page reference counter for zbud pages. The ref counter replaces
> zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> when zbud_free() is called during reclaim. It allows implementation of
> additional reclaim paths.
> 
> The page count is incremented when:
>  - a handle is created and passed to zswap (in zbud_alloc()),
>  - user-supplied eviction callback is called (in zbud_reclaim_page()).
> 
> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>

Looks good to me.
Reviewed-by: Bob Liu <bob.liu@oracle.com>

> ---
>  mm/zbud.c |  150 +++++++++++++++++++++++++++++++++++--------------------------
>  1 file changed, 86 insertions(+), 64 deletions(-)
> 
> diff --git a/mm/zbud.c b/mm/zbud.c
> index ad1e781..a8e986f 100644
> --- a/mm/zbud.c
> +++ b/mm/zbud.c
> @@ -109,7 +109,6 @@ struct zbud_header {
>  	struct list_head lru;
>  	unsigned int first_chunks;
>  	unsigned int last_chunks;
> -	bool under_reclaim;
>  };
>  
>  /*****************
> @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
>  	zhdr->last_chunks = 0;
>  	INIT_LIST_HEAD(&zhdr->buddy);
>  	INIT_LIST_HEAD(&zhdr->lru);
> -	zhdr->under_reclaim = 0;
>  	return zhdr;
>  }
>  
> -/* Resets the struct page fields and frees the page */
> -static void free_zbud_page(struct zbud_header *zhdr)
> -{
> -	__free_page(virt_to_page(zhdr));
> -}
> -
>  /*
>   * Encodes the handle of a particular buddy within a zbud page
>   * Pool lock should be held as this function accesses first|last_chunks
> @@ -188,6 +180,65 @@ static int num_free_chunks(struct zbud_header *zhdr)
>  	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
>  }
>  
> +/*
> + * Called after zbud_free() or zbud_alloc().
> + * Checks whether given zbud page has to be:
> + *  - removed from buddied/unbuddied/LRU lists completetely (zbud_free).
> + *  - moved from buddied to unbuddied list
> + *    and to beginning of LRU (zbud_alloc, zbud_free),
> + *  - added to buddied list and LRU (zbud_alloc),
> + *
> + * The page must be already removed from buddied/unbuddied lists.
> + * Must be called under pool->lock.
> + */
> +static void rebalance_lists(struct zbud_pool *pool, struct zbud_header *zhdr)
> +{

Nit picker, how about change the name to adjust_lists() or something
like this because we don't do any rebalancing.

-- 
Regards,
-Bob

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction
  2013-08-06  6:42 ` Krzysztof Kozlowski
@ 2013-08-06  9:16   ` Bob Liu
  -1 siblings, 0 replies; 38+ messages in thread
From: Bob Liu @ 2013-08-06  9:16 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park

On 08/06/2013 02:42 PM, Krzysztof Kozlowski wrote:
> Hi,
> 
> Currently zbud pages are not movable and they cannot be allocated from CMA
> region. These patches try to address the problem by:
> 1. Adding a new form of reclaim of zbud pages.
> 2. Reclaiming zbud pages during migration and compaction.
> 3. Allocating zbud pages with __GFP_RECLAIMABLE flag.
> 
> This reclaim process is different than zbud_reclaim_page(). It acts more
> like swapoff() by trying to unuse pages stored in zbud page and bring
> them back to memory. The standard zbud_reclaim_page() on the other hand
> tries to write them back.

I prefer to migrate zbud pages directly if it's possible than reclaiming
them during compaction.

> 
> One of patches introduces a new flag: PageZbud. This flag is used in
> isolate_migratepages_range() to grab zbud pages and pass them later
> for reclaim. Probably this could be replaced with something
> smarter than a flag used only in one case.
> Any ideas for a better solution are welcome.
> 
> This patch set is based on Linux 3.11-rc4.
> 
> TODOs:
> 1. Replace PageZbud flag with other solution.
> 
> Best regards,
> Krzysztof Kozlowski
> 
> 
> Krzysztof Kozlowski (4):
>   zbud: use page ref counter for zbud pages
>   mm: split code for unusing swap entries from try_to_unuse
>   mm: add zbud flag to page flags
>   mm: reclaim zbud pages on migration and compaction
> 
>  include/linux/page-flags.h |   12 ++
>  include/linux/swapfile.h   |    2 +
>  include/linux/zbud.h       |   11 +-
>  mm/compaction.c            |   20 ++-
>  mm/internal.h              |    1 +
>  mm/page_alloc.c            |    9 ++
>  mm/swapfile.c              |  354 +++++++++++++++++++++++---------------------
>  mm/zbud.c                  |  301 +++++++++++++++++++++++++------------
>  mm/zswap.c                 |   57 ++++++-
>  9 files changed, 499 insertions(+), 268 deletions(-)
> 

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction
@ 2013-08-06  9:16   ` Bob Liu
  0 siblings, 0 replies; 38+ messages in thread
From: Bob Liu @ 2013-08-06  9:16 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park

On 08/06/2013 02:42 PM, Krzysztof Kozlowski wrote:
> Hi,
> 
> Currently zbud pages are not movable and they cannot be allocated from CMA
> region. These patches try to address the problem by:
> 1. Adding a new form of reclaim of zbud pages.
> 2. Reclaiming zbud pages during migration and compaction.
> 3. Allocating zbud pages with __GFP_RECLAIMABLE flag.
> 
> This reclaim process is different than zbud_reclaim_page(). It acts more
> like swapoff() by trying to unuse pages stored in zbud page and bring
> them back to memory. The standard zbud_reclaim_page() on the other hand
> tries to write them back.

I prefer to migrate zbud pages directly if it's possible than reclaiming
them during compaction.

> 
> One of patches introduces a new flag: PageZbud. This flag is used in
> isolate_migratepages_range() to grab zbud pages and pass them later
> for reclaim. Probably this could be replaced with something
> smarter than a flag used only in one case.
> Any ideas for a better solution are welcome.
> 
> This patch set is based on Linux 3.11-rc4.
> 
> TODOs:
> 1. Replace PageZbud flag with other solution.
> 
> Best regards,
> Krzysztof Kozlowski
> 
> 
> Krzysztof Kozlowski (4):
>   zbud: use page ref counter for zbud pages
>   mm: split code for unusing swap entries from try_to_unuse
>   mm: add zbud flag to page flags
>   mm: reclaim zbud pages on migration and compaction
> 
>  include/linux/page-flags.h |   12 ++
>  include/linux/swapfile.h   |    2 +
>  include/linux/zbud.h       |   11 +-
>  mm/compaction.c            |   20 ++-
>  mm/internal.h              |    1 +
>  mm/page_alloc.c            |    9 ++
>  mm/swapfile.c              |  354 +++++++++++++++++++++++---------------------
>  mm/zbud.c                  |  301 +++++++++++++++++++++++++------------
>  mm/zswap.c                 |   57 ++++++-
>  9 files changed, 499 insertions(+), 268 deletions(-)
> 

-- 
Regards,
-Bob

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
  2013-08-06  9:00     ` Bob Liu
@ 2013-08-06  9:25       ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-06  9:25 UTC (permalink / raw)
  To: Bob Liu
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park,
	Tomasz Stanislawski

Hi Bob,

Thank you for review.

On wto, 2013-08-06 at 17:00 +0800, Bob Liu wrote:
> Nit picker, how about change the name to adjust_lists() or something
> like this because we don't do any rebalancing.

OK, I'll change it.

Best regards,
Krzysztof



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
@ 2013-08-06  9:25       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-06  9:25 UTC (permalink / raw)
  To: Bob Liu
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park,
	Tomasz Stanislawski

Hi Bob,

Thank you for review.

On wto, 2013-08-06 at 17:00 +0800, Bob Liu wrote:
> Nit picker, how about change the name to adjust_lists() or something
> like this because we don't do any rebalancing.

OK, I'll change it.

Best regards,
Krzysztof


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction
  2013-08-06  9:16   ` Bob Liu
@ 2013-08-06 13:05     ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-06 13:05 UTC (permalink / raw)
  To: Bob Liu
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park

On wto, 2013-08-06 at 17:16 +0800, Bob Liu wrote:
> On 08/06/2013 02:42 PM, Krzysztof Kozlowski wrote:
> > This reclaim process is different than zbud_reclaim_page(). It acts more
> > like swapoff() by trying to unuse pages stored in zbud page and bring
> > them back to memory. The standard zbud_reclaim_page() on the other hand
> > tries to write them back.
> 
> I prefer to migrate zbud pages directly if it's possible than reclaiming
> them during compaction.

I think it is possible however it would be definitely more complex. In
case of migration the zswap handles should be updated as they are just
virtual addresses. Am I right?

Best regards,
Krzysztof



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction
@ 2013-08-06 13:05     ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-06 13:05 UTC (permalink / raw)
  To: Bob Liu
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park

On wto, 2013-08-06 at 17:16 +0800, Bob Liu wrote:
> On 08/06/2013 02:42 PM, Krzysztof Kozlowski wrote:
> > This reclaim process is different than zbud_reclaim_page(). It acts more
> > like swapoff() by trying to unuse pages stored in zbud page and bring
> > them back to memory. The standard zbud_reclaim_page() on the other hand
> > tries to write them back.
> 
> I prefer to migrate zbud pages directly if it's possible than reclaiming
> them during compaction.

I think it is possible however it would be definitely more complex. In
case of migration the zswap handles should be updated as they are just
virtual addresses. Am I right?

Best regards,
Krzysztof


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/4] mm: add zbud flag to page flags
  2013-08-06  6:42   ` Krzysztof Kozlowski
@ 2013-08-06 16:58     ` Dave Hansen
  -1 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2013-08-06 16:58 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park

On 08/05/2013 11:42 PM, Krzysztof Kozlowski wrote:
> +#ifdef CONFIG_ZBUD
> +	/* Allocated by zbud. Flag is necessary to find zbud pages to unuse
> +	 * during migration/compaction.
> +	 */
> +	PG_zbud,
> +#endif

Do you _really_ need an absolutely new, unshared page flag?
The zbud code doesn't really look like it uses any of the space in
'struct page'.

I think you could pretty easily alias PG_zbud=PG_slab, then use the
page->{private,slab_cache} (or some other unused field) in 'struct page'
to store a cookie to differentiate slab and zbud pages.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/4] mm: add zbud flag to page flags
@ 2013-08-06 16:58     ` Dave Hansen
  0 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2013-08-06 16:58 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park

On 08/05/2013 11:42 PM, Krzysztof Kozlowski wrote:
> +#ifdef CONFIG_ZBUD
> +	/* Allocated by zbud. Flag is necessary to find zbud pages to unuse
> +	 * during migration/compaction.
> +	 */
> +	PG_zbud,
> +#endif

Do you _really_ need an absolutely new, unshared page flag?
The zbud code doesn't really look like it uses any of the space in
'struct page'.

I think you could pretty easily alias PG_zbud=PG_slab, then use the
page->{private,slab_cache} (or some other unused field) in 'struct page'
to store a cookie to differentiate slab and zbud pages.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
  2013-08-06  6:42   ` Krzysztof Kozlowski
  (?)
  (?)
@ 2013-08-06 18:51   ` Seth Jennings
  2013-08-07  7:31       ` Krzysztof Kozlowski
  -1 siblings, 1 reply; 38+ messages in thread
From: Seth Jennings @ 2013-08-06 18:51 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park,
	Tomasz Stanislawski

On Tue, Aug 06, 2013 at 08:42:38AM +0200, Krzysztof Kozlowski wrote:
> Use page reference counter for zbud pages. The ref counter replaces
> zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> when zbud_free() is called during reclaim. It allows implementation of
> additional reclaim paths.
> 
> The page count is incremented when:
>  - a handle is created and passed to zswap (in zbud_alloc()),
>  - user-supplied eviction callback is called (in zbud_reclaim_page()).

I like the idea.  I few things below.  Also agree with Bob the
s/rebalance/adjust/ for rebalance_lists().

> 
> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
> ---
>  mm/zbud.c |  150 +++++++++++++++++++++++++++++++++++--------------------------
>  1 file changed, 86 insertions(+), 64 deletions(-)
> 
> diff --git a/mm/zbud.c b/mm/zbud.c
> index ad1e781..a8e986f 100644
> --- a/mm/zbud.c
> +++ b/mm/zbud.c
> @@ -109,7 +109,6 @@ struct zbud_header {
>  	struct list_head lru;
>  	unsigned int first_chunks;
>  	unsigned int last_chunks;
> -	bool under_reclaim;
>  };
> 
>  /*****************
> @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
>  	zhdr->last_chunks = 0;
>  	INIT_LIST_HEAD(&zhdr->buddy);
>  	INIT_LIST_HEAD(&zhdr->lru);
> -	zhdr->under_reclaim = 0;
>  	return zhdr;
>  }
> 
> -/* Resets the struct page fields and frees the page */
> -static void free_zbud_page(struct zbud_header *zhdr)
> -{
> -	__free_page(virt_to_page(zhdr));
> -}
> -
>  /*
>   * Encodes the handle of a particular buddy within a zbud page
>   * Pool lock should be held as this function accesses first|last_chunks
> @@ -188,6 +180,65 @@ static int num_free_chunks(struct zbud_header *zhdr)
>  	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
>  }
> 
> +/*
> + * Called after zbud_free() or zbud_alloc().
> + * Checks whether given zbud page has to be:
> + *  - removed from buddied/unbuddied/LRU lists completetely (zbud_free).
> + *  - moved from buddied to unbuddied list
> + *    and to beginning of LRU (zbud_alloc, zbud_free),
> + *  - added to buddied list and LRU (zbud_alloc),
> + *
> + * The page must be already removed from buddied/unbuddied lists.
> + * Must be called under pool->lock.
> + */
> +static void rebalance_lists(struct zbud_pool *pool, struct zbud_header *zhdr)
> +{
> +	if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> +		/* zbud_free() */
> +		list_del(&zhdr->lru);
> +		return;
> +	} else if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {

s/else if/if/ since the if above returns if true.

> +		/* zbud_free() or zbud_alloc() */
> +		int freechunks = num_free_chunks(zhdr);
> +		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> +	} else {
> +		/* zbud_alloc() */
> +		list_add(&zhdr->buddy, &pool->buddied);
> +	}
> +	/* Add/move zbud page to beginning of LRU */
> +	if (!list_empty(&zhdr->lru))
> +		list_del(&zhdr->lru);

We don't want to reinsert to the LRU list if we have called zbud_free()
on a zbud page that previously had two buddies.  This code causes the
zbud page to move to the front of the LRU list which is not what we want.

> +	list_add(&zhdr->lru, &pool->lru);
> +}
> +
> +/*
> + * Increases ref count for zbud page.
> + */
> +static void get_zbud_page(struct zbud_header *zhdr)
> +{
> +	get_page(virt_to_page(zhdr));
> +}
> +
> +/*
> + * Decreases ref count for zbud page and frees the page if it reaches 0
> + * (no external references, e.g. handles).
> + *
> + * Must be called under pool->lock.
> + *
> + * Returns 1 if page was freed and 0 otherwise.
> + */
> +static int put_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr)
> +{
> +	struct page *page = virt_to_page(zhdr);
> +	if (put_page_testzero(page)) {
> +		free_hot_cold_page(page, 0);
> +		pool->pages_nr--;
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +
>  /*****************
>   * API Functions
>  *****************/
> @@ -250,7 +301,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
>  int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>  			unsigned long *handle)
>  {
> -	int chunks, i, freechunks;
> +	int chunks, i;
>  	struct zbud_header *zhdr = NULL;
>  	enum buddy bud;
>  	struct page *page;
> @@ -273,6 +324,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>  				bud = FIRST;
>  			else
>  				bud = LAST;
> +			get_zbud_page(zhdr);
>  			goto found;
>  		}
>  	}
> @@ -284,6 +336,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>  		return -ENOMEM;
>  	spin_lock(&pool->lock);
>  	pool->pages_nr++;
> +	/*
> +	 * We will be using zhdr instead of page, so
> +	 * don't increase the page count.
> +	 */
>  	zhdr = init_zbud_page(page);
>  	bud = FIRST;
> 
> @@ -293,19 +349,7 @@ found:
>  	else
>  		zhdr->last_chunks = chunks;
> 
> -	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
> -		/* Add to unbuddied list */
> -		freechunks = num_free_chunks(zhdr);
> -		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> -	} else {
> -		/* Add to buddied list */
> -		list_add(&zhdr->buddy, &pool->buddied);
> -	}
> -
> -	/* Add/move zbud page to beginning of LRU */
> -	if (!list_empty(&zhdr->lru))
> -		list_del(&zhdr->lru);
> -	list_add(&zhdr->lru, &pool->lru);
> +	rebalance_lists(pool, zhdr);
> 
>  	*handle = encode_handle(zhdr, bud);
>  	spin_unlock(&pool->lock);
> @@ -326,10 +370,10 @@ found:
>  void zbud_free(struct zbud_pool *pool, unsigned long handle)
>  {
>  	struct zbud_header *zhdr;
> -	int freechunks;
> 
>  	spin_lock(&pool->lock);
>  	zhdr = handle_to_zbud_header(handle);
> +	BUG_ON(zhdr->last_chunks == 0 && zhdr->first_chunks == 0);

Not sure we need this.  Maybe, at most, VM_BUG_ON()?

> 
>  	/* If first buddy, handle will be page aligned */
>  	if ((handle - ZHDR_SIZE_ALIGNED) & ~PAGE_MASK)
> @@ -337,26 +381,9 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
>  	else
>  		zhdr->first_chunks = 0;
> 
> -	if (zhdr->under_reclaim) {
> -		/* zbud page is under reclaim, reclaim will free */
> -		spin_unlock(&pool->lock);
> -		return;
> -	}
> -
> -	/* Remove from existing buddy list */
>  	list_del(&zhdr->buddy);
> -
> -	if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> -		/* zbud page is empty, free */
> -		list_del(&zhdr->lru);
> -		free_zbud_page(zhdr);
> -		pool->pages_nr--;
> -	} else {
> -		/* Add to unbuddied list */
> -		freechunks = num_free_chunks(zhdr);
> -		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> -	}
> -
> +	rebalance_lists(pool, zhdr);
> +	put_zbud_page(pool, zhdr);
>  	spin_unlock(&pool->lock);
>  }
> 
> @@ -400,7 +427,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
>   */
>  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
>  {
> -	int i, ret, freechunks;
> +	int i, ret;
>  	struct zbud_header *zhdr;
>  	unsigned long first_handle = 0, last_handle = 0;
> 
> @@ -411,11 +438,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
>  		return -EINVAL;
>  	}
>  	for (i = 0; i < retries; i++) {
> +		if (list_empty(&pool->lru)) {
> +			/*
> +			 * LRU was emptied during evict calls in previous
> +			 * iteration but put_zbud_page() returned 0 meaning
> +			 * that someone still holds the page. This may
> +			 * happen when some other mm mechanism increased
> +			 * the page count.
> +			 * In such case we succedded with reclaim.
> +			 */
> +			return 0;
> +		}
>  		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> +		BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0);

Again here.

Thanks,
Seth

> +		/* Move this last element to beginning of LRU */
>  		list_del(&zhdr->lru);
> -		list_del(&zhdr->buddy);
> +		list_add(&zhdr->lru, &pool->lru);
>  		/* Protect zbud page against free */
> -		zhdr->under_reclaim = true;
> +		get_zbud_page(zhdr);
>  		/*
>  		 * We need encode the handles before unlocking, since we can
>  		 * race with free that will set (first|last)_chunks to 0
> @@ -441,28 +481,10 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
>  		}
>  next:
>  		spin_lock(&pool->lock);
> -		zhdr->under_reclaim = false;
> -		if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> -			/*
> -			 * Both buddies are now free, free the zbud page and
> -			 * return success.
> -			 */
> -			free_zbud_page(zhdr);
> -			pool->pages_nr--;
> +		if (put_zbud_page(pool, zhdr)) {
>  			spin_unlock(&pool->lock);
>  			return 0;
> -		} else if (zhdr->first_chunks == 0 ||
> -				zhdr->last_chunks == 0) {
> -			/* add to unbuddied list */
> -			freechunks = num_free_chunks(zhdr);
> -			list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> -		} else {
> -			/* add to buddied list */
> -			list_add(&zhdr->buddy, &pool->buddied);
>  		}
> -
> -		/* add to beginning of LRU */
> -		list_add(&zhdr->lru, &pool->lru);
>  	}
>  	spin_unlock(&pool->lock);
>  	return -EAGAIN;
> -- 
> 1.7.9.5
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/4] mm: add zbud flag to page flags
  2013-08-06  6:42   ` Krzysztof Kozlowski
  (?)
  (?)
@ 2013-08-06 18:57   ` Seth Jennings
  -1 siblings, 0 replies; 38+ messages in thread
From: Seth Jennings @ 2013-08-06 18:57 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park

On Tue, Aug 06, 2013 at 08:42:40AM +0200, Krzysztof Kozlowski wrote:
> Add PageZbud flag to page flags to distinguish pages allocated in zbud.
> Currently these pages do not have any flags set.

Yeah, using a page flags for zbud is probably not going to be
acceptable.  We'll have to find some other way to identify zbud pages.

Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/4] mm: add zbud flag to page flags
  2013-08-06 16:58     ` Dave Hansen
@ 2013-08-07  7:04       ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-07  7:04 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park

On wto, 2013-08-06 at 09:58 -0700, Dave Hansen wrote:
> On 08/05/2013 11:42 PM, Krzysztof Kozlowski wrote:
> > +#ifdef CONFIG_ZBUD
> > +	/* Allocated by zbud. Flag is necessary to find zbud pages to unuse
> > +	 * during migration/compaction.
> > +	 */
> > +	PG_zbud,
> > +#endif
> 
> Do you _really_ need an absolutely new, unshared page flag?
> The zbud code doesn't really look like it uses any of the space in
> 'struct page'.
> 
> I think you could pretty easily alias PG_zbud=PG_slab, then use the
> page->{private,slab_cache} (or some other unused field) in 'struct page'
> to store a cookie to differentiate slab and zbud pages.

Thanks for idea, I will try that.

Best regards,
Krzysztof



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/4] mm: add zbud flag to page flags
@ 2013-08-07  7:04       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-07  7:04 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park

On wto, 2013-08-06 at 09:58 -0700, Dave Hansen wrote:
> On 08/05/2013 11:42 PM, Krzysztof Kozlowski wrote:
> > +#ifdef CONFIG_ZBUD
> > +	/* Allocated by zbud. Flag is necessary to find zbud pages to unuse
> > +	 * during migration/compaction.
> > +	 */
> > +	PG_zbud,
> > +#endif
> 
> Do you _really_ need an absolutely new, unshared page flag?
> The zbud code doesn't really look like it uses any of the space in
> 'struct page'.
> 
> I think you could pretty easily alias PG_zbud=PG_slab, then use the
> page->{private,slab_cache} (or some other unused field) in 'struct page'
> to store a cookie to differentiate slab and zbud pages.

Thanks for idea, I will try that.

Best regards,
Krzysztof


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
  2013-08-06 18:51   ` Seth Jennings
@ 2013-08-07  7:31       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-07  7:31 UTC (permalink / raw)
  To: Seth Jennings
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park,
	Tomasz Stanislawski, Bob Liu

Hi Seth,

On wto, 2013-08-06 at 13:51 -0500, Seth Jennings wrote:
> I like the idea.  I few things below.  Also agree with Bob the
> s/rebalance/adjust/ for rebalance_lists().
OK.

> s/else if/if/ since the if above returns if true.
Sure.

> > +		/* zbud_free() or zbud_alloc() */
> > +		int freechunks = num_free_chunks(zhdr);
> > +		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> > +	} else {
> > +		/* zbud_alloc() */
> > +		list_add(&zhdr->buddy, &pool->buddied);
> > +	}
> > +	/* Add/move zbud page to beginning of LRU */
> > +	if (!list_empty(&zhdr->lru))
> > +		list_del(&zhdr->lru);
> 
> We don't want to reinsert to the LRU list if we have called zbud_free()
> on a zbud page that previously had two buddies.  This code causes the
> zbud page to move to the front of the LRU list which is not what we want.

Right, I'll fix it.


> > @@ -326,10 +370,10 @@ found:
> >  void zbud_free(struct zbud_pool *pool, unsigned long handle)
> >  {
> >  	struct zbud_header *zhdr;
> > -	int freechunks;
> > 
> >  	spin_lock(&pool->lock);
> >  	zhdr = handle_to_zbud_header(handle);
> > +	BUG_ON(zhdr->last_chunks == 0 && zhdr->first_chunks == 0);
> 
> Not sure we need this.  Maybe, at most, VM_BUG_ON()?

Actually it is somehow a leftover after debugging so I don't mind
removing it completely.


> > @@ -411,11 +438,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  		return -EINVAL;
> >  	}
> >  	for (i = 0; i < retries; i++) {
> > +		if (list_empty(&pool->lru)) {
> > +			/*
> > +			 * LRU was emptied during evict calls in previous
> > +			 * iteration but put_zbud_page() returned 0 meaning
> > +			 * that someone still holds the page. This may
> > +			 * happen when some other mm mechanism increased
> > +			 * the page count.
> > +			 * In such case we succedded with reclaim.
> > +			 */
> > +			return 0;
> > +		}
> >  		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> > +		BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0);
> 
> Again here.
I agree.


Thanks for comments,
Krzysztof



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
@ 2013-08-07  7:31       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-07  7:31 UTC (permalink / raw)
  To: Seth Jennings
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park,
	Tomasz Stanislawski, Bob Liu

Hi Seth,

On wto, 2013-08-06 at 13:51 -0500, Seth Jennings wrote:
> I like the idea.  I few things below.  Also agree with Bob the
> s/rebalance/adjust/ for rebalance_lists().
OK.

> s/else if/if/ since the if above returns if true.
Sure.

> > +		/* zbud_free() or zbud_alloc() */
> > +		int freechunks = num_free_chunks(zhdr);
> > +		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> > +	} else {
> > +		/* zbud_alloc() */
> > +		list_add(&zhdr->buddy, &pool->buddied);
> > +	}
> > +	/* Add/move zbud page to beginning of LRU */
> > +	if (!list_empty(&zhdr->lru))
> > +		list_del(&zhdr->lru);
> 
> We don't want to reinsert to the LRU list if we have called zbud_free()
> on a zbud page that previously had two buddies.  This code causes the
> zbud page to move to the front of the LRU list which is not what we want.

Right, I'll fix it.


> > @@ -326,10 +370,10 @@ found:
> >  void zbud_free(struct zbud_pool *pool, unsigned long handle)
> >  {
> >  	struct zbud_header *zhdr;
> > -	int freechunks;
> > 
> >  	spin_lock(&pool->lock);
> >  	zhdr = handle_to_zbud_header(handle);
> > +	BUG_ON(zhdr->last_chunks == 0 && zhdr->first_chunks == 0);
> 
> Not sure we need this.  Maybe, at most, VM_BUG_ON()?

Actually it is somehow a leftover after debugging so I don't mind
removing it completely.


> > @@ -411,11 +438,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  		return -EINVAL;
> >  	}
> >  	for (i = 0; i < retries; i++) {
> > +		if (list_empty(&pool->lru)) {
> > +			/*
> > +			 * LRU was emptied during evict calls in previous
> > +			 * iteration but put_zbud_page() returned 0 meaning
> > +			 * that someone still holds the page. This may
> > +			 * happen when some other mm mechanism increased
> > +			 * the page count.
> > +			 * In such case we succedded with reclaim.
> > +			 */
> > +			return 0;
> > +		}
> >  		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> > +		BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0);
> 
> Again here.
I agree.


Thanks for comments,
Krzysztof


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/4] mm: add zbud flag to page flags
  2013-08-06 16:58     ` Dave Hansen
@ 2013-08-08  7:26       ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-08  7:26 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park

Hi,

On wto, 2013-08-06 at 09:58 -0700, Dave Hansen wrote:
> On 08/05/2013 11:42 PM, Krzysztof Kozlowski wrote:
> > +#ifdef CONFIG_ZBUD
> > +	/* Allocated by zbud. Flag is necessary to find zbud pages to unuse
> > +	 * during migration/compaction.
> > +	 */
> > +	PG_zbud,
> > +#endif
> 
> Do you _really_ need an absolutely new, unshared page flag?
> The zbud code doesn't really look like it uses any of the space in
> 'struct page'.
> 
> I think you could pretty easily alias PG_zbud=PG_slab, then use the
> page->{private,slab_cache} (or some other unused field) in 'struct page'
> to store a cookie to differentiate slab and zbud pages.

How about using page->_mapcount with negative value (-129)? Just like
PageBuddy()?


Best regards,
Krzysztof


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/4] mm: add zbud flag to page flags
@ 2013-08-08  7:26       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-08  7:26 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park

Hi,

On wto, 2013-08-06 at 09:58 -0700, Dave Hansen wrote:
> On 08/05/2013 11:42 PM, Krzysztof Kozlowski wrote:
> > +#ifdef CONFIG_ZBUD
> > +	/* Allocated by zbud. Flag is necessary to find zbud pages to unuse
> > +	 * during migration/compaction.
> > +	 */
> > +	PG_zbud,
> > +#endif
> 
> Do you _really_ need an absolutely new, unshared page flag?
> The zbud code doesn't really look like it uses any of the space in
> 'struct page'.
> 
> I think you could pretty easily alias PG_zbud=PG_slab, then use the
> page->{private,slab_cache} (or some other unused field) in 'struct page'
> to store a cookie to differentiate slab and zbud pages.

How about using page->_mapcount with negative value (-129)? Just like
PageBuddy()?


Best regards,
Krzysztof

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
  2013-09-09 19:47     ` Seth Jennings
@ 2013-09-10  7:16       ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-09-10  7:16 UTC (permalink / raw)
  To: Seth Jennings
  Cc: linux-mm, linux-kernel, Andrew Morton, Bob Liu, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park,
	Dave Hansen, Minchan Kim, Tomasz Stanislawski

Hi,

On pon, 2013-09-09 at 14:47 -0500, Seth Jennings wrote:
> On Fri, Aug 30, 2013 at 10:42:53AM +0200, Krzysztof Kozlowski wrote:
> > Use page reference counter for zbud pages. The ref counter replaces
> > zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> > when zbud_free() is called during reclaim. It allows implementation of
> > additional reclaim paths.
> 
> I like this idea.
> 
> > 
> > The page count is incremented when:
> >  - a handle is created and passed to zswap (in zbud_alloc()),
> >  - user-supplied eviction callback is called (in zbud_reclaim_page()).
> > 
> > Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> > Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
> > Reviewed-by: Bob Liu <bob.liu@oracle.com>
> > ---
> >  mm/zbud.c |   97 +++++++++++++++++++++++++++++++++----------------------------
> >  1 file changed, 52 insertions(+), 45 deletions(-)
> > 
> > diff --git a/mm/zbud.c b/mm/zbud.c
> > index ad1e781..aa9a15c 100644
> > --- a/mm/zbud.c
> > +++ b/mm/zbud.c
> > @@ -109,7 +109,6 @@ struct zbud_header {
> >  	struct list_head lru;
> >  	unsigned int first_chunks;
> >  	unsigned int last_chunks;
> > -	bool under_reclaim;
> >  };
> > 
> >  /*****************
> > @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
> >  	zhdr->last_chunks = 0;
> >  	INIT_LIST_HEAD(&zhdr->buddy);
> >  	INIT_LIST_HEAD(&zhdr->lru);
> > -	zhdr->under_reclaim = 0;
> >  	return zhdr;
> >  }
> > 
> > -/* Resets the struct page fields and frees the page */
> > -static void free_zbud_page(struct zbud_header *zhdr)
> > -{
> > -	__free_page(virt_to_page(zhdr));
> > -}
> > -
> >  /*
> >   * Encodes the handle of a particular buddy within a zbud page
> >   * Pool lock should be held as this function accesses first|last_chunks
> > @@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
> >  	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
> >  }
> > 
> > +/*
> > + * Increases ref count for zbud page.
> > + */
> > +static void get_zbud_page(struct zbud_header *zhdr)
> > +{
> > +	get_page(virt_to_page(zhdr));
> > +}
> > +
> > +/*
> > + * Decreases ref count for zbud page and frees the page if it reaches 0
> > + * (no external references, e.g. handles).
> > + *
> > + * Returns 1 if page was freed and 0 otherwise.
> > + */
> > +static int put_zbud_page(struct zbud_header *zhdr)
> > +{
> > +	struct page *page = virt_to_page(zhdr);
> > +	if (put_page_testzero(page)) {
> > +		free_hot_cold_page(page, 0);
> > +		return 1;
> > +	}
> > +	return 0;
> > +}
> > +
> > +
> >  /*****************
> >   * API Functions
> >  *****************/
> > @@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
> >  int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> >  			unsigned long *handle)
> >  {
> > -	int chunks, i, freechunks;
> > +	int chunks, i;
> 
> This change (make freechunks a block local variable) is unrelated to the
> patch description and should be its own patch.

Sure, I will split this into 2 patches.


[...]
> >  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> >  	}
> > 
> > +	put_zbud_page(zhdr);
> >  	spin_unlock(&pool->lock);
> >  }
> > 
> > @@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
> >   */
> >  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  {
> > -	int i, ret, freechunks;
> > +	int i, ret;
> >  	struct zbud_header *zhdr;
> >  	unsigned long first_handle = 0, last_handle = 0;
> > 
> > @@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  		return -EINVAL;
> >  	}
> >  	for (i = 0; i < retries; i++) {
> > +		if (list_empty(&pool->lru)) {
> > +			/*
> > +			 * LRU was emptied during evict calls in previous
> > +			 * iteration but put_zbud_page() returned 0 meaning
> > +			 * that someone still holds the page. This may
> > +			 * happen when some other mm mechanism increased
> > +			 * the page count.
> > +			 * In such case we succedded with reclaim.
> > +			 */
> > +			spin_unlock(&pool->lock);
> > +			return 0;
> > +		}
> >  		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> > +		/* Move this last element to beginning of LRU */
> >  		list_del(&zhdr->lru);
> > -		list_del(&zhdr->buddy);
> > +		list_add(&zhdr->lru, &pool->lru);
> 
> This adds the entry back to the lru list, but the entry is never removed
> from the list.  I'm not sure how this could work.

The entry will be removed from LRU and un/buddied list in zbud_free()
called from eviction handler.

So this actually works well however I wonder if this may be a source of
race conditions because during the call to eviction handler, the zbud
header is still present on buddied list (till call to zbud_free()).

Another issue is that I haven't update the documentation for
zbud_reclaim_page(). I fix this in next version.


> >  		/* Protect zbud page against free */
> > -		zhdr->under_reclaim = true;
> > +		get_zbud_page(zhdr);
> >  		/*
> >  		 * We need encode the handles before unlocking, since we can
> >  		 * race with free that will set (first|last)_chunks to 0
> > @@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  				goto next;
> >  		}
> >  next:
> > -		spin_lock(&pool->lock);
> > -		zhdr->under_reclaim = false;
> > -		if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> > -			/*
> > -			 * Both buddies are now free, free the zbud page and
> > -			 * return success.
> > -			 */
> > -			free_zbud_page(zhdr);
> > -			pool->pages_nr--;
> > -			spin_unlock(&pool->lock);
> > +		if (put_zbud_page(zhdr))
> 
> There is a single put_zbud_page() regardless of result of the eviction
> attempts.  What if both buddies of a buddied zbud page are evicted?
> Then you leak the zbud page.  What if the eviction in an unbuddied zbud
> page failed?  Then you prematurely free the page and crash later.

This single put_zbud_page() puts the reference obtained few lines
earlier.

If one or two buddies are evicted then the eviction handler should call
zbud_free(). zbud_free() will put the initial ref count (obtained from
zbud_alloc()).

If the eviction handler failed then zbud_free() won't be called so the
initial ref count still be valid and this put_zbud_page() won't free the
page.

Let me show the flow of ref counts:
1. zbud_alloc() for first buddy, ref count = 1
2. zbud_alloc() for last buddy, ref count = 2
3. zbud_reclaim_page() start, ref count = 3
4. calling user eviction handler
5. zbud_free() for first buddy (called from handler), ref count = 2
6. zbud_free() for last buddy (called from handler), ref count = 1
7. zbud_reclaim_page() end, ref count = 0 (free the page)

If eviction handler fails at step 5 or 6, then the ref count won't drop
to 0 as zswap still holds the initial reference to handle.


Thanks for comments. I appreciate them very much.


Best regards,
Krzysztof




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
@ 2013-09-10  7:16       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-09-10  7:16 UTC (permalink / raw)
  To: Seth Jennings
  Cc: linux-mm, linux-kernel, Andrew Morton, Bob Liu, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park,
	Dave Hansen, Minchan Kim, Tomasz Stanislawski

Hi,

On pon, 2013-09-09 at 14:47 -0500, Seth Jennings wrote:
> On Fri, Aug 30, 2013 at 10:42:53AM +0200, Krzysztof Kozlowski wrote:
> > Use page reference counter for zbud pages. The ref counter replaces
> > zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> > when zbud_free() is called during reclaim. It allows implementation of
> > additional reclaim paths.
> 
> I like this idea.
> 
> > 
> > The page count is incremented when:
> >  - a handle is created and passed to zswap (in zbud_alloc()),
> >  - user-supplied eviction callback is called (in zbud_reclaim_page()).
> > 
> > Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> > Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
> > Reviewed-by: Bob Liu <bob.liu@oracle.com>
> > ---
> >  mm/zbud.c |   97 +++++++++++++++++++++++++++++++++----------------------------
> >  1 file changed, 52 insertions(+), 45 deletions(-)
> > 
> > diff --git a/mm/zbud.c b/mm/zbud.c
> > index ad1e781..aa9a15c 100644
> > --- a/mm/zbud.c
> > +++ b/mm/zbud.c
> > @@ -109,7 +109,6 @@ struct zbud_header {
> >  	struct list_head lru;
> >  	unsigned int first_chunks;
> >  	unsigned int last_chunks;
> > -	bool under_reclaim;
> >  };
> > 
> >  /*****************
> > @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
> >  	zhdr->last_chunks = 0;
> >  	INIT_LIST_HEAD(&zhdr->buddy);
> >  	INIT_LIST_HEAD(&zhdr->lru);
> > -	zhdr->under_reclaim = 0;
> >  	return zhdr;
> >  }
> > 
> > -/* Resets the struct page fields and frees the page */
> > -static void free_zbud_page(struct zbud_header *zhdr)
> > -{
> > -	__free_page(virt_to_page(zhdr));
> > -}
> > -
> >  /*
> >   * Encodes the handle of a particular buddy within a zbud page
> >   * Pool lock should be held as this function accesses first|last_chunks
> > @@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
> >  	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
> >  }
> > 
> > +/*
> > + * Increases ref count for zbud page.
> > + */
> > +static void get_zbud_page(struct zbud_header *zhdr)
> > +{
> > +	get_page(virt_to_page(zhdr));
> > +}
> > +
> > +/*
> > + * Decreases ref count for zbud page and frees the page if it reaches 0
> > + * (no external references, e.g. handles).
> > + *
> > + * Returns 1 if page was freed and 0 otherwise.
> > + */
> > +static int put_zbud_page(struct zbud_header *zhdr)
> > +{
> > +	struct page *page = virt_to_page(zhdr);
> > +	if (put_page_testzero(page)) {
> > +		free_hot_cold_page(page, 0);
> > +		return 1;
> > +	}
> > +	return 0;
> > +}
> > +
> > +
> >  /*****************
> >   * API Functions
> >  *****************/
> > @@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
> >  int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> >  			unsigned long *handle)
> >  {
> > -	int chunks, i, freechunks;
> > +	int chunks, i;
> 
> This change (make freechunks a block local variable) is unrelated to the
> patch description and should be its own patch.

Sure, I will split this into 2 patches.


[...]
> >  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> >  	}
> > 
> > +	put_zbud_page(zhdr);
> >  	spin_unlock(&pool->lock);
> >  }
> > 
> > @@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
> >   */
> >  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  {
> > -	int i, ret, freechunks;
> > +	int i, ret;
> >  	struct zbud_header *zhdr;
> >  	unsigned long first_handle = 0, last_handle = 0;
> > 
> > @@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  		return -EINVAL;
> >  	}
> >  	for (i = 0; i < retries; i++) {
> > +		if (list_empty(&pool->lru)) {
> > +			/*
> > +			 * LRU was emptied during evict calls in previous
> > +			 * iteration but put_zbud_page() returned 0 meaning
> > +			 * that someone still holds the page. This may
> > +			 * happen when some other mm mechanism increased
> > +			 * the page count.
> > +			 * In such case we succedded with reclaim.
> > +			 */
> > +			spin_unlock(&pool->lock);
> > +			return 0;
> > +		}
> >  		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> > +		/* Move this last element to beginning of LRU */
> >  		list_del(&zhdr->lru);
> > -		list_del(&zhdr->buddy);
> > +		list_add(&zhdr->lru, &pool->lru);
> 
> This adds the entry back to the lru list, but the entry is never removed
> from the list.  I'm not sure how this could work.

The entry will be removed from LRU and un/buddied list in zbud_free()
called from eviction handler.

So this actually works well however I wonder if this may be a source of
race conditions because during the call to eviction handler, the zbud
header is still present on buddied list (till call to zbud_free()).

Another issue is that I haven't update the documentation for
zbud_reclaim_page(). I fix this in next version.


> >  		/* Protect zbud page against free */
> > -		zhdr->under_reclaim = true;
> > +		get_zbud_page(zhdr);
> >  		/*
> >  		 * We need encode the handles before unlocking, since we can
> >  		 * race with free that will set (first|last)_chunks to 0
> > @@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  				goto next;
> >  		}
> >  next:
> > -		spin_lock(&pool->lock);
> > -		zhdr->under_reclaim = false;
> > -		if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> > -			/*
> > -			 * Both buddies are now free, free the zbud page and
> > -			 * return success.
> > -			 */
> > -			free_zbud_page(zhdr);
> > -			pool->pages_nr--;
> > -			spin_unlock(&pool->lock);
> > +		if (put_zbud_page(zhdr))
> 
> There is a single put_zbud_page() regardless of result of the eviction
> attempts.  What if both buddies of a buddied zbud page are evicted?
> Then you leak the zbud page.  What if the eviction in an unbuddied zbud
> page failed?  Then you prematurely free the page and crash later.

This single put_zbud_page() puts the reference obtained few lines
earlier.

If one or two buddies are evicted then the eviction handler should call
zbud_free(). zbud_free() will put the initial ref count (obtained from
zbud_alloc()).

If the eviction handler failed then zbud_free() won't be called so the
initial ref count still be valid and this put_zbud_page() won't free the
page.

Let me show the flow of ref counts:
1. zbud_alloc() for first buddy, ref count = 1
2. zbud_alloc() for last buddy, ref count = 2
3. zbud_reclaim_page() start, ref count = 3
4. calling user eviction handler
5. zbud_free() for first buddy (called from handler), ref count = 2
6. zbud_free() for last buddy (called from handler), ref count = 1
7. zbud_reclaim_page() end, ref count = 0 (free the page)

If eviction handler fails at step 5 or 6, then the ref count won't drop
to 0 as zswap still holds the initial reference to handle.


Thanks for comments. I appreciate them very much.


Best regards,
Krzysztof



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
  2013-08-30  8:42   ` Krzysztof Kozlowski
@ 2013-09-09 19:47     ` Seth Jennings
  -1 siblings, 0 replies; 38+ messages in thread
From: Seth Jennings @ 2013-09-09 19:47 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: linux-mm, linux-kernel, Andrew Morton, Bob Liu, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park,
	Dave Hansen, Minchan Kim, Tomasz Stanislawski

On Fri, Aug 30, 2013 at 10:42:53AM +0200, Krzysztof Kozlowski wrote:
> Use page reference counter for zbud pages. The ref counter replaces
> zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> when zbud_free() is called during reclaim. It allows implementation of
> additional reclaim paths.

I like this idea.

> 
> The page count is incremented when:
>  - a handle is created and passed to zswap (in zbud_alloc()),
>  - user-supplied eviction callback is called (in zbud_reclaim_page()).
> 
> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
> Reviewed-by: Bob Liu <bob.liu@oracle.com>
> ---
>  mm/zbud.c |   97 +++++++++++++++++++++++++++++++++----------------------------
>  1 file changed, 52 insertions(+), 45 deletions(-)
> 
> diff --git a/mm/zbud.c b/mm/zbud.c
> index ad1e781..aa9a15c 100644
> --- a/mm/zbud.c
> +++ b/mm/zbud.c
> @@ -109,7 +109,6 @@ struct zbud_header {
>  	struct list_head lru;
>  	unsigned int first_chunks;
>  	unsigned int last_chunks;
> -	bool under_reclaim;
>  };
> 
>  /*****************
> @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
>  	zhdr->last_chunks = 0;
>  	INIT_LIST_HEAD(&zhdr->buddy);
>  	INIT_LIST_HEAD(&zhdr->lru);
> -	zhdr->under_reclaim = 0;
>  	return zhdr;
>  }
> 
> -/* Resets the struct page fields and frees the page */
> -static void free_zbud_page(struct zbud_header *zhdr)
> -{
> -	__free_page(virt_to_page(zhdr));
> -}
> -
>  /*
>   * Encodes the handle of a particular buddy within a zbud page
>   * Pool lock should be held as this function accesses first|last_chunks
> @@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
>  	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
>  }
> 
> +/*
> + * Increases ref count for zbud page.
> + */
> +static void get_zbud_page(struct zbud_header *zhdr)
> +{
> +	get_page(virt_to_page(zhdr));
> +}
> +
> +/*
> + * Decreases ref count for zbud page and frees the page if it reaches 0
> + * (no external references, e.g. handles).
> + *
> + * Returns 1 if page was freed and 0 otherwise.
> + */
> +static int put_zbud_page(struct zbud_header *zhdr)
> +{
> +	struct page *page = virt_to_page(zhdr);
> +	if (put_page_testzero(page)) {
> +		free_hot_cold_page(page, 0);
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +
>  /*****************
>   * API Functions
>  *****************/
> @@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
>  int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>  			unsigned long *handle)
>  {
> -	int chunks, i, freechunks;
> +	int chunks, i;

This change (make freechunks a block local variable) is unrelated to the
patch description and should be its own patch.

>  	struct zbud_header *zhdr = NULL;
>  	enum buddy bud;
>  	struct page *page;
> @@ -273,6 +290,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>  				bud = FIRST;
>  			else
>  				bud = LAST;
> +			get_zbud_page(zhdr);
>  			goto found;
>  		}
>  	}
> @@ -284,6 +302,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>  		return -ENOMEM;
>  	spin_lock(&pool->lock);
>  	pool->pages_nr++;
> +	/*
> +	 * We will be using zhdr instead of page, so
> +	 * don't increase the page count.
> +	 */
>  	zhdr = init_zbud_page(page);
>  	bud = FIRST;
> 
> @@ -295,7 +317,7 @@ found:
> 
>  	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
>  		/* Add to unbuddied list */
> -		freechunks = num_free_chunks(zhdr);
> +		int freechunks = num_free_chunks(zhdr);

Same here.

>  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
>  	} else {
>  		/* Add to buddied list */
> @@ -326,7 +348,6 @@ found:
>  void zbud_free(struct zbud_pool *pool, unsigned long handle)
>  {
>  	struct zbud_header *zhdr;
> -	int freechunks;

And here.

> 
>  	spin_lock(&pool->lock);
>  	zhdr = handle_to_zbud_header(handle);
> @@ -337,26 +358,19 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
>  	else
>  		zhdr->first_chunks = 0;
> 
> -	if (zhdr->under_reclaim) {
> -		/* zbud page is under reclaim, reclaim will free */
> -		spin_unlock(&pool->lock);
> -		return;
> -	}
> -
>  	/* Remove from existing buddy list */
>  	list_del(&zhdr->buddy);
> 
>  	if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> -		/* zbud page is empty, free */
>  		list_del(&zhdr->lru);
> -		free_zbud_page(zhdr);
>  		pool->pages_nr--;
>  	} else {
>  		/* Add to unbuddied list */
> -		freechunks = num_free_chunks(zhdr);
> +		int freechunks = num_free_chunks(zhdr);

Last one.

>  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
>  	}
> 
> +	put_zbud_page(zhdr);
>  	spin_unlock(&pool->lock);
>  }
> 
> @@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
>   */
>  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
>  {
> -	int i, ret, freechunks;
> +	int i, ret;
>  	struct zbud_header *zhdr;
>  	unsigned long first_handle = 0, last_handle = 0;
> 
> @@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
>  		return -EINVAL;
>  	}
>  	for (i = 0; i < retries; i++) {
> +		if (list_empty(&pool->lru)) {
> +			/*
> +			 * LRU was emptied during evict calls in previous
> +			 * iteration but put_zbud_page() returned 0 meaning
> +			 * that someone still holds the page. This may
> +			 * happen when some other mm mechanism increased
> +			 * the page count.
> +			 * In such case we succedded with reclaim.
> +			 */
> +			spin_unlock(&pool->lock);
> +			return 0;
> +		}
>  		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> +		/* Move this last element to beginning of LRU */
>  		list_del(&zhdr->lru);
> -		list_del(&zhdr->buddy);
> +		list_add(&zhdr->lru, &pool->lru);

This adds the entry back to the lru list, but the entry is never removed
from the list.  I'm not sure how this could work.

>  		/* Protect zbud page against free */
> -		zhdr->under_reclaim = true;
> +		get_zbud_page(zhdr);
>  		/*
>  		 * We need encode the handles before unlocking, since we can
>  		 * race with free that will set (first|last)_chunks to 0
> @@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
>  				goto next;
>  		}
>  next:
> -		spin_lock(&pool->lock);
> -		zhdr->under_reclaim = false;
> -		if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> -			/*
> -			 * Both buddies are now free, free the zbud page and
> -			 * return success.
> -			 */
> -			free_zbud_page(zhdr);
> -			pool->pages_nr--;
> -			spin_unlock(&pool->lock);
> +		if (put_zbud_page(zhdr))

There is a single put_zbud_page() regardless of result of the eviction
attempts.  What if both buddies of a buddied zbud page are evicted?
Then you leak the zbud page.  What if the eviction in an unbuddied zbud
page failed?  Then you prematurely free the page and crash later.

Seth

>  			return 0;
> -		} else if (zhdr->first_chunks == 0 ||
> -				zhdr->last_chunks == 0) {
> -			/* add to unbuddied list */
> -			freechunks = num_free_chunks(zhdr);
> -			list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> -		} else {
> -			/* add to buddied list */
> -			list_add(&zhdr->buddy, &pool->buddied);
> -		}
> -
> -		/* add to beginning of LRU */
> -		list_add(&zhdr->lru, &pool->lru);
> +		spin_lock(&pool->lock);
>  	}
>  	spin_unlock(&pool->lock);
>  	return -EAGAIN;
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
@ 2013-09-09 19:47     ` Seth Jennings
  0 siblings, 0 replies; 38+ messages in thread
From: Seth Jennings @ 2013-09-09 19:47 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: linux-mm, linux-kernel, Andrew Morton, Bob Liu, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park,
	Dave Hansen, Minchan Kim, Tomasz Stanislawski

On Fri, Aug 30, 2013 at 10:42:53AM +0200, Krzysztof Kozlowski wrote:
> Use page reference counter for zbud pages. The ref counter replaces
> zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> when zbud_free() is called during reclaim. It allows implementation of
> additional reclaim paths.

I like this idea.

> 
> The page count is incremented when:
>  - a handle is created and passed to zswap (in zbud_alloc()),
>  - user-supplied eviction callback is called (in zbud_reclaim_page()).
> 
> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
> Reviewed-by: Bob Liu <bob.liu@oracle.com>
> ---
>  mm/zbud.c |   97 +++++++++++++++++++++++++++++++++----------------------------
>  1 file changed, 52 insertions(+), 45 deletions(-)
> 
> diff --git a/mm/zbud.c b/mm/zbud.c
> index ad1e781..aa9a15c 100644
> --- a/mm/zbud.c
> +++ b/mm/zbud.c
> @@ -109,7 +109,6 @@ struct zbud_header {
>  	struct list_head lru;
>  	unsigned int first_chunks;
>  	unsigned int last_chunks;
> -	bool under_reclaim;
>  };
> 
>  /*****************
> @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
>  	zhdr->last_chunks = 0;
>  	INIT_LIST_HEAD(&zhdr->buddy);
>  	INIT_LIST_HEAD(&zhdr->lru);
> -	zhdr->under_reclaim = 0;
>  	return zhdr;
>  }
> 
> -/* Resets the struct page fields and frees the page */
> -static void free_zbud_page(struct zbud_header *zhdr)
> -{
> -	__free_page(virt_to_page(zhdr));
> -}
> -
>  /*
>   * Encodes the handle of a particular buddy within a zbud page
>   * Pool lock should be held as this function accesses first|last_chunks
> @@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
>  	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
>  }
> 
> +/*
> + * Increases ref count for zbud page.
> + */
> +static void get_zbud_page(struct zbud_header *zhdr)
> +{
> +	get_page(virt_to_page(zhdr));
> +}
> +
> +/*
> + * Decreases ref count for zbud page and frees the page if it reaches 0
> + * (no external references, e.g. handles).
> + *
> + * Returns 1 if page was freed and 0 otherwise.
> + */
> +static int put_zbud_page(struct zbud_header *zhdr)
> +{
> +	struct page *page = virt_to_page(zhdr);
> +	if (put_page_testzero(page)) {
> +		free_hot_cold_page(page, 0);
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +
>  /*****************
>   * API Functions
>  *****************/
> @@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
>  int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>  			unsigned long *handle)
>  {
> -	int chunks, i, freechunks;
> +	int chunks, i;

This change (make freechunks a block local variable) is unrelated to the
patch description and should be its own patch.

>  	struct zbud_header *zhdr = NULL;
>  	enum buddy bud;
>  	struct page *page;
> @@ -273,6 +290,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>  				bud = FIRST;
>  			else
>  				bud = LAST;
> +			get_zbud_page(zhdr);
>  			goto found;
>  		}
>  	}
> @@ -284,6 +302,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>  		return -ENOMEM;
>  	spin_lock(&pool->lock);
>  	pool->pages_nr++;
> +	/*
> +	 * We will be using zhdr instead of page, so
> +	 * don't increase the page count.
> +	 */
>  	zhdr = init_zbud_page(page);
>  	bud = FIRST;
> 
> @@ -295,7 +317,7 @@ found:
> 
>  	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
>  		/* Add to unbuddied list */
> -		freechunks = num_free_chunks(zhdr);
> +		int freechunks = num_free_chunks(zhdr);

Same here.

>  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
>  	} else {
>  		/* Add to buddied list */
> @@ -326,7 +348,6 @@ found:
>  void zbud_free(struct zbud_pool *pool, unsigned long handle)
>  {
>  	struct zbud_header *zhdr;
> -	int freechunks;

And here.

> 
>  	spin_lock(&pool->lock);
>  	zhdr = handle_to_zbud_header(handle);
> @@ -337,26 +358,19 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
>  	else
>  		zhdr->first_chunks = 0;
> 
> -	if (zhdr->under_reclaim) {
> -		/* zbud page is under reclaim, reclaim will free */
> -		spin_unlock(&pool->lock);
> -		return;
> -	}
> -
>  	/* Remove from existing buddy list */
>  	list_del(&zhdr->buddy);
> 
>  	if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> -		/* zbud page is empty, free */
>  		list_del(&zhdr->lru);
> -		free_zbud_page(zhdr);
>  		pool->pages_nr--;
>  	} else {
>  		/* Add to unbuddied list */
> -		freechunks = num_free_chunks(zhdr);
> +		int freechunks = num_free_chunks(zhdr);

Last one.

>  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
>  	}
> 
> +	put_zbud_page(zhdr);
>  	spin_unlock(&pool->lock);
>  }
> 
> @@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
>   */
>  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
>  {
> -	int i, ret, freechunks;
> +	int i, ret;
>  	struct zbud_header *zhdr;
>  	unsigned long first_handle = 0, last_handle = 0;
> 
> @@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
>  		return -EINVAL;
>  	}
>  	for (i = 0; i < retries; i++) {
> +		if (list_empty(&pool->lru)) {
> +			/*
> +			 * LRU was emptied during evict calls in previous
> +			 * iteration but put_zbud_page() returned 0 meaning
> +			 * that someone still holds the page. This may
> +			 * happen when some other mm mechanism increased
> +			 * the page count.
> +			 * In such case we succedded with reclaim.
> +			 */
> +			spin_unlock(&pool->lock);
> +			return 0;
> +		}
>  		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> +		/* Move this last element to beginning of LRU */
>  		list_del(&zhdr->lru);
> -		list_del(&zhdr->buddy);
> +		list_add(&zhdr->lru, &pool->lru);

This adds the entry back to the lru list, but the entry is never removed
from the list.  I'm not sure how this could work.

>  		/* Protect zbud page against free */
> -		zhdr->under_reclaim = true;
> +		get_zbud_page(zhdr);
>  		/*
>  		 * We need encode the handles before unlocking, since we can
>  		 * race with free that will set (first|last)_chunks to 0
> @@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
>  				goto next;
>  		}
>  next:
> -		spin_lock(&pool->lock);
> -		zhdr->under_reclaim = false;
> -		if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> -			/*
> -			 * Both buddies are now free, free the zbud page and
> -			 * return success.
> -			 */
> -			free_zbud_page(zhdr);
> -			pool->pages_nr--;
> -			spin_unlock(&pool->lock);
> +		if (put_zbud_page(zhdr))

There is a single put_zbud_page() regardless of result of the eviction
attempts.  What if both buddies of a buddied zbud page are evicted?
Then you leak the zbud page.  What if the eviction in an unbuddied zbud
page failed?  Then you prematurely free the page and crash later.

Seth

>  			return 0;
> -		} else if (zhdr->first_chunks == 0 ||
> -				zhdr->last_chunks == 0) {
> -			/* add to unbuddied list */
> -			freechunks = num_free_chunks(zhdr);
> -			list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> -		} else {
> -			/* add to buddied list */
> -			list_add(&zhdr->buddy, &pool->buddied);
> -		}
> -
> -		/* add to beginning of LRU */
> -		list_add(&zhdr->lru, &pool->lru);
> +		spin_lock(&pool->lock);
>  	}
>  	spin_unlock(&pool->lock);
>  	return -EAGAIN;
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
  2013-09-08  9:04     ` Bob Liu
@ 2013-09-09  6:14       ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-09-09  6:14 UTC (permalink / raw)
  To: Bob Liu
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park,
	Dave Hansen, Minchan Kim, Tomasz Stanislawski

Hi Bob,


On nie, 2013-09-08 at 17:04 +0800, Bob Liu wrote:
> Hi Krzysztof,
> 
> On 08/30/2013 04:42 PM, Krzysztof Kozlowski wrote:
> > Use page reference counter for zbud pages. The ref counter replaces
> > zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> > when zbud_free() is called during reclaim. It allows implementation of
> > additional reclaim paths.
> > 
> > The page count is incremented when:
> >  - a handle is created and passed to zswap (in zbud_alloc()),
> >  - user-supplied eviction callback is called (in zbud_reclaim_page()).
> > 
> > Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> > Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
> > Reviewed-by: Bob Liu <bob.liu@oracle.com>
> 
> AFAIR, the previous version you sent out has a function  called
> rebalance_lists() which I think is a good clean up.
> But I didn't see that function any more in this version.

Yes, that function was added because similar code was present in
zbud_free/zbud_alloc/zbud_reclaim_page. I removed it because it turned
out that there is no benefit in generalizing this code. Seth found an
issue in this function (zbud page was re-inserted in to LRU when
zbud_free() was called on one buddy). Fixing this issue added more if-s
and more code thus enlarging the rebalance_lists() function.

Best regards,
Krzysztof

> Thanks,
> -Bob
> 
> 
> > ---
> >  mm/zbud.c |   97 +++++++++++++++++++++++++++++++++----------------------------
> >  1 file changed, 52 insertions(+), 45 deletions(-)
> > 
> > diff --git a/mm/zbud.c b/mm/zbud.c
> > index ad1e781..aa9a15c 100644
> > --- a/mm/zbud.c
> > +++ b/mm/zbud.c
> > @@ -109,7 +109,6 @@ struct zbud_header {
> >  	struct list_head lru;
> >  	unsigned int first_chunks;
> >  	unsigned int last_chunks;
> > -	bool under_reclaim;
> >  };
> >  
> >  /*****************
> > @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
> >  	zhdr->last_chunks = 0;
> >  	INIT_LIST_HEAD(&zhdr->buddy);
> >  	INIT_LIST_HEAD(&zhdr->lru);
> > -	zhdr->under_reclaim = 0;
> >  	return zhdr;
> >  }
> >  
> > -/* Resets the struct page fields and frees the page */
> > -static void free_zbud_page(struct zbud_header *zhdr)
> > -{
> > -	__free_page(virt_to_page(zhdr));
> > -}
> > -
> >  /*
> >   * Encodes the handle of a particular buddy within a zbud page
> >   * Pool lock should be held as this function accesses first|last_chunks
> > @@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
> >  	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
> >  }
> >  
> > +/*
> > + * Increases ref count for zbud page.
> > + */
> > +static void get_zbud_page(struct zbud_header *zhdr)
> > +{
> > +	get_page(virt_to_page(zhdr));
> > +}
> > +
> > +/*
> > + * Decreases ref count for zbud page and frees the page if it reaches 0
> > + * (no external references, e.g. handles).
> > + *
> > + * Returns 1 if page was freed and 0 otherwise.
> > + */
> > +static int put_zbud_page(struct zbud_header *zhdr)
> > +{
> > +	struct page *page = virt_to_page(zhdr);
> > +	if (put_page_testzero(page)) {
> > +		free_hot_cold_page(page, 0);
> > +		return 1;
> > +	}
> > +	return 0;
> > +}
> > +
> > +
> >  /*****************
> >   * API Functions
> >  *****************/
> > @@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
> >  int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> >  			unsigned long *handle)
> >  {
> > -	int chunks, i, freechunks;
> > +	int chunks, i;
> >  	struct zbud_header *zhdr = NULL;
> >  	enum buddy bud;
> >  	struct page *page;
> > @@ -273,6 +290,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> >  				bud = FIRST;
> >  			else
> >  				bud = LAST;
> > +			get_zbud_page(zhdr);
> >  			goto found;
> >  		}
> >  	}
> > @@ -284,6 +302,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> >  		return -ENOMEM;
> >  	spin_lock(&pool->lock);
> >  	pool->pages_nr++;
> > +	/*
> > +	 * We will be using zhdr instead of page, so
> > +	 * don't increase the page count.
> > +	 */
> >  	zhdr = init_zbud_page(page);
> >  	bud = FIRST;
> >  
> > @@ -295,7 +317,7 @@ found:
> >  
> >  	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
> >  		/* Add to unbuddied list */
> > -		freechunks = num_free_chunks(zhdr);
> > +		int freechunks = num_free_chunks(zhdr);
> >  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> >  	} else {
> >  		/* Add to buddied list */
> > @@ -326,7 +348,6 @@ found:
> >  void zbud_free(struct zbud_pool *pool, unsigned long handle)
> >  {
> >  	struct zbud_header *zhdr;
> > -	int freechunks;
> >  
> >  	spin_lock(&pool->lock);
> >  	zhdr = handle_to_zbud_header(handle);
> > @@ -337,26 +358,19 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
> >  	else
> >  		zhdr->first_chunks = 0;
> >  
> > -	if (zhdr->under_reclaim) {
> > -		/* zbud page is under reclaim, reclaim will free */
> > -		spin_unlock(&pool->lock);
> > -		return;
> > -	}
> > -
> >  	/* Remove from existing buddy list */
> >  	list_del(&zhdr->buddy);
> >  
> >  	if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> > -		/* zbud page is empty, free */
> >  		list_del(&zhdr->lru);
> > -		free_zbud_page(zhdr);
> >  		pool->pages_nr--;
> >  	} else {
> >  		/* Add to unbuddied list */
> > -		freechunks = num_free_chunks(zhdr);
> > +		int freechunks = num_free_chunks(zhdr);
> >  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> >  	}
> >  
> > +	put_zbud_page(zhdr);
> >  	spin_unlock(&pool->lock);
> >  }
> >  
> > @@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
> >   */
> >  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  {
> > -	int i, ret, freechunks;
> > +	int i, ret;
> >  	struct zbud_header *zhdr;
> >  	unsigned long first_handle = 0, last_handle = 0;
> >  
> > @@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  		return -EINVAL;
> >  	}
> >  	for (i = 0; i < retries; i++) {
> > +		if (list_empty(&pool->lru)) {
> > +			/*
> > +			 * LRU was emptied during evict calls in previous
> > +			 * iteration but put_zbud_page() returned 0 meaning
> > +			 * that someone still holds the page. This may
> > +			 * happen when some other mm mechanism increased
> > +			 * the page count.
> > +			 * In such case we succedded with reclaim.
> > +			 */
> > +			spin_unlock(&pool->lock);
> > +			return 0;
> > +		}
> >  		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> > +		/* Move this last element to beginning of LRU */
> >  		list_del(&zhdr->lru);
> > -		list_del(&zhdr->buddy);
> > +		list_add(&zhdr->lru, &pool->lru);
> >  		/* Protect zbud page against free */
> > -		zhdr->under_reclaim = true;
> > +		get_zbud_page(zhdr);
> >  		/*
> >  		 * We need encode the handles before unlocking, since we can
> >  		 * race with free that will set (first|last)_chunks to 0
> > @@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  				goto next;
> >  		}
> >  next:
> > -		spin_lock(&pool->lock);
> > -		zhdr->under_reclaim = false;
> > -		if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> > -			/*
> > -			 * Both buddies are now free, free the zbud page and
> > -			 * return success.
> > -			 */
> > -			free_zbud_page(zhdr);
> > -			pool->pages_nr--;
> > -			spin_unlock(&pool->lock);
> > +		if (put_zbud_page(zhdr))
> >  			return 0;
> > -		} else if (zhdr->first_chunks == 0 ||
> > -				zhdr->last_chunks == 0) {
> > -			/* add to unbuddied list */
> > -			freechunks = num_free_chunks(zhdr);
> > -			list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> > -		} else {
> > -			/* add to buddied list */
> > -			list_add(&zhdr->buddy, &pool->buddied);
> > -		}
> > -
> > -		/* add to beginning of LRU */
> > -		list_add(&zhdr->lru, &pool->lru);
> > +		spin_lock(&pool->lock);
> >  	}
> >  	spin_unlock(&pool->lock);
> >  	return -EAGAIN;
> > 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
@ 2013-09-09  6:14       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-09-09  6:14 UTC (permalink / raw)
  To: Bob Liu
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park,
	Dave Hansen, Minchan Kim, Tomasz Stanislawski

Hi Bob,


On nie, 2013-09-08 at 17:04 +0800, Bob Liu wrote:
> Hi Krzysztof,
> 
> On 08/30/2013 04:42 PM, Krzysztof Kozlowski wrote:
> > Use page reference counter for zbud pages. The ref counter replaces
> > zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> > when zbud_free() is called during reclaim. It allows implementation of
> > additional reclaim paths.
> > 
> > The page count is incremented when:
> >  - a handle is created and passed to zswap (in zbud_alloc()),
> >  - user-supplied eviction callback is called (in zbud_reclaim_page()).
> > 
> > Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> > Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
> > Reviewed-by: Bob Liu <bob.liu@oracle.com>
> 
> AFAIR, the previous version you sent out has a function  called
> rebalance_lists() which I think is a good clean up.
> But I didn't see that function any more in this version.

Yes, that function was added because similar code was present in
zbud_free/zbud_alloc/zbud_reclaim_page. I removed it because it turned
out that there is no benefit in generalizing this code. Seth found an
issue in this function (zbud page was re-inserted in to LRU when
zbud_free() was called on one buddy). Fixing this issue added more if-s
and more code thus enlarging the rebalance_lists() function.

Best regards,
Krzysztof

> Thanks,
> -Bob
> 
> 
> > ---
> >  mm/zbud.c |   97 +++++++++++++++++++++++++++++++++----------------------------
> >  1 file changed, 52 insertions(+), 45 deletions(-)
> > 
> > diff --git a/mm/zbud.c b/mm/zbud.c
> > index ad1e781..aa9a15c 100644
> > --- a/mm/zbud.c
> > +++ b/mm/zbud.c
> > @@ -109,7 +109,6 @@ struct zbud_header {
> >  	struct list_head lru;
> >  	unsigned int first_chunks;
> >  	unsigned int last_chunks;
> > -	bool under_reclaim;
> >  };
> >  
> >  /*****************
> > @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
> >  	zhdr->last_chunks = 0;
> >  	INIT_LIST_HEAD(&zhdr->buddy);
> >  	INIT_LIST_HEAD(&zhdr->lru);
> > -	zhdr->under_reclaim = 0;
> >  	return zhdr;
> >  }
> >  
> > -/* Resets the struct page fields and frees the page */
> > -static void free_zbud_page(struct zbud_header *zhdr)
> > -{
> > -	__free_page(virt_to_page(zhdr));
> > -}
> > -
> >  /*
> >   * Encodes the handle of a particular buddy within a zbud page
> >   * Pool lock should be held as this function accesses first|last_chunks
> > @@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
> >  	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
> >  }
> >  
> > +/*
> > + * Increases ref count for zbud page.
> > + */
> > +static void get_zbud_page(struct zbud_header *zhdr)
> > +{
> > +	get_page(virt_to_page(zhdr));
> > +}
> > +
> > +/*
> > + * Decreases ref count for zbud page and frees the page if it reaches 0
> > + * (no external references, e.g. handles).
> > + *
> > + * Returns 1 if page was freed and 0 otherwise.
> > + */
> > +static int put_zbud_page(struct zbud_header *zhdr)
> > +{
> > +	struct page *page = virt_to_page(zhdr);
> > +	if (put_page_testzero(page)) {
> > +		free_hot_cold_page(page, 0);
> > +		return 1;
> > +	}
> > +	return 0;
> > +}
> > +
> > +
> >  /*****************
> >   * API Functions
> >  *****************/
> > @@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
> >  int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> >  			unsigned long *handle)
> >  {
> > -	int chunks, i, freechunks;
> > +	int chunks, i;
> >  	struct zbud_header *zhdr = NULL;
> >  	enum buddy bud;
> >  	struct page *page;
> > @@ -273,6 +290,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> >  				bud = FIRST;
> >  			else
> >  				bud = LAST;
> > +			get_zbud_page(zhdr);
> >  			goto found;
> >  		}
> >  	}
> > @@ -284,6 +302,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> >  		return -ENOMEM;
> >  	spin_lock(&pool->lock);
> >  	pool->pages_nr++;
> > +	/*
> > +	 * We will be using zhdr instead of page, so
> > +	 * don't increase the page count.
> > +	 */
> >  	zhdr = init_zbud_page(page);
> >  	bud = FIRST;
> >  
> > @@ -295,7 +317,7 @@ found:
> >  
> >  	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
> >  		/* Add to unbuddied list */
> > -		freechunks = num_free_chunks(zhdr);
> > +		int freechunks = num_free_chunks(zhdr);
> >  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> >  	} else {
> >  		/* Add to buddied list */
> > @@ -326,7 +348,6 @@ found:
> >  void zbud_free(struct zbud_pool *pool, unsigned long handle)
> >  {
> >  	struct zbud_header *zhdr;
> > -	int freechunks;
> >  
> >  	spin_lock(&pool->lock);
> >  	zhdr = handle_to_zbud_header(handle);
> > @@ -337,26 +358,19 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
> >  	else
> >  		zhdr->first_chunks = 0;
> >  
> > -	if (zhdr->under_reclaim) {
> > -		/* zbud page is under reclaim, reclaim will free */
> > -		spin_unlock(&pool->lock);
> > -		return;
> > -	}
> > -
> >  	/* Remove from existing buddy list */
> >  	list_del(&zhdr->buddy);
> >  
> >  	if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> > -		/* zbud page is empty, free */
> >  		list_del(&zhdr->lru);
> > -		free_zbud_page(zhdr);
> >  		pool->pages_nr--;
> >  	} else {
> >  		/* Add to unbuddied list */
> > -		freechunks = num_free_chunks(zhdr);
> > +		int freechunks = num_free_chunks(zhdr);
> >  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> >  	}
> >  
> > +	put_zbud_page(zhdr);
> >  	spin_unlock(&pool->lock);
> >  }
> >  
> > @@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
> >   */
> >  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  {
> > -	int i, ret, freechunks;
> > +	int i, ret;
> >  	struct zbud_header *zhdr;
> >  	unsigned long first_handle = 0, last_handle = 0;
> >  
> > @@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  		return -EINVAL;
> >  	}
> >  	for (i = 0; i < retries; i++) {
> > +		if (list_empty(&pool->lru)) {
> > +			/*
> > +			 * LRU was emptied during evict calls in previous
> > +			 * iteration but put_zbud_page() returned 0 meaning
> > +			 * that someone still holds the page. This may
> > +			 * happen when some other mm mechanism increased
> > +			 * the page count.
> > +			 * In such case we succedded with reclaim.
> > +			 */
> > +			spin_unlock(&pool->lock);
> > +			return 0;
> > +		}
> >  		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> > +		/* Move this last element to beginning of LRU */
> >  		list_del(&zhdr->lru);
> > -		list_del(&zhdr->buddy);
> > +		list_add(&zhdr->lru, &pool->lru);
> >  		/* Protect zbud page against free */
> > -		zhdr->under_reclaim = true;
> > +		get_zbud_page(zhdr);
> >  		/*
> >  		 * We need encode the handles before unlocking, since we can
> >  		 * race with free that will set (first|last)_chunks to 0
> > @@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  				goto next;
> >  		}
> >  next:
> > -		spin_lock(&pool->lock);
> > -		zhdr->under_reclaim = false;
> > -		if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> > -			/*
> > -			 * Both buddies are now free, free the zbud page and
> > -			 * return success.
> > -			 */
> > -			free_zbud_page(zhdr);
> > -			pool->pages_nr--;
> > -			spin_unlock(&pool->lock);
> > +		if (put_zbud_page(zhdr))
> >  			return 0;
> > -		} else if (zhdr->first_chunks == 0 ||
> > -				zhdr->last_chunks == 0) {
> > -			/* add to unbuddied list */
> > -			freechunks = num_free_chunks(zhdr);
> > -			list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> > -		} else {
> > -			/* add to buddied list */
> > -			list_add(&zhdr->buddy, &pool->buddied);
> > -		}
> > -
> > -		/* add to beginning of LRU */
> > -		list_add(&zhdr->lru, &pool->lru);
> > +		spin_lock(&pool->lock);
> >  	}
> >  	spin_unlock(&pool->lock);
> >  	return -EAGAIN;
> > 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
  2013-08-30  8:42   ` Krzysztof Kozlowski
@ 2013-09-08  9:04     ` Bob Liu
  -1 siblings, 0 replies; 38+ messages in thread
From: Bob Liu @ 2013-09-08  9:04 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park,
	Dave Hansen, Minchan Kim, Tomasz Stanislawski

Hi Krzysztof,

On 08/30/2013 04:42 PM, Krzysztof Kozlowski wrote:
> Use page reference counter for zbud pages. The ref counter replaces
> zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> when zbud_free() is called during reclaim. It allows implementation of
> additional reclaim paths.
> 
> The page count is incremented when:
>  - a handle is created and passed to zswap (in zbud_alloc()),
>  - user-supplied eviction callback is called (in zbud_reclaim_page()).
> 
> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
> Reviewed-by: Bob Liu <bob.liu@oracle.com>

AFAIR, the previous version you sent out has a function  called
rebalance_lists() which I think is a good clean up.
But I didn't see that function any more in this version.

Thanks,
-Bob


> ---
>  mm/zbud.c |   97 +++++++++++++++++++++++++++++++++----------------------------
>  1 file changed, 52 insertions(+), 45 deletions(-)
> 
> diff --git a/mm/zbud.c b/mm/zbud.c
> index ad1e781..aa9a15c 100644
> --- a/mm/zbud.c
> +++ b/mm/zbud.c
> @@ -109,7 +109,6 @@ struct zbud_header {
>  	struct list_head lru;
>  	unsigned int first_chunks;
>  	unsigned int last_chunks;
> -	bool under_reclaim;
>  };
>  
>  /*****************
> @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
>  	zhdr->last_chunks = 0;
>  	INIT_LIST_HEAD(&zhdr->buddy);
>  	INIT_LIST_HEAD(&zhdr->lru);
> -	zhdr->under_reclaim = 0;
>  	return zhdr;
>  }
>  
> -/* Resets the struct page fields and frees the page */
> -static void free_zbud_page(struct zbud_header *zhdr)
> -{
> -	__free_page(virt_to_page(zhdr));
> -}
> -
>  /*
>   * Encodes the handle of a particular buddy within a zbud page
>   * Pool lock should be held as this function accesses first|last_chunks
> @@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
>  	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
>  }
>  
> +/*
> + * Increases ref count for zbud page.
> + */
> +static void get_zbud_page(struct zbud_header *zhdr)
> +{
> +	get_page(virt_to_page(zhdr));
> +}
> +
> +/*
> + * Decreases ref count for zbud page and frees the page if it reaches 0
> + * (no external references, e.g. handles).
> + *
> + * Returns 1 if page was freed and 0 otherwise.
> + */
> +static int put_zbud_page(struct zbud_header *zhdr)
> +{
> +	struct page *page = virt_to_page(zhdr);
> +	if (put_page_testzero(page)) {
> +		free_hot_cold_page(page, 0);
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +
>  /*****************
>   * API Functions
>  *****************/
> @@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
>  int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>  			unsigned long *handle)
>  {
> -	int chunks, i, freechunks;
> +	int chunks, i;
>  	struct zbud_header *zhdr = NULL;
>  	enum buddy bud;
>  	struct page *page;
> @@ -273,6 +290,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>  				bud = FIRST;
>  			else
>  				bud = LAST;
> +			get_zbud_page(zhdr);
>  			goto found;
>  		}
>  	}
> @@ -284,6 +302,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>  		return -ENOMEM;
>  	spin_lock(&pool->lock);
>  	pool->pages_nr++;
> +	/*
> +	 * We will be using zhdr instead of page, so
> +	 * don't increase the page count.
> +	 */
>  	zhdr = init_zbud_page(page);
>  	bud = FIRST;
>  
> @@ -295,7 +317,7 @@ found:
>  
>  	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
>  		/* Add to unbuddied list */
> -		freechunks = num_free_chunks(zhdr);
> +		int freechunks = num_free_chunks(zhdr);
>  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
>  	} else {
>  		/* Add to buddied list */
> @@ -326,7 +348,6 @@ found:
>  void zbud_free(struct zbud_pool *pool, unsigned long handle)
>  {
>  	struct zbud_header *zhdr;
> -	int freechunks;
>  
>  	spin_lock(&pool->lock);
>  	zhdr = handle_to_zbud_header(handle);
> @@ -337,26 +358,19 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
>  	else
>  		zhdr->first_chunks = 0;
>  
> -	if (zhdr->under_reclaim) {
> -		/* zbud page is under reclaim, reclaim will free */
> -		spin_unlock(&pool->lock);
> -		return;
> -	}
> -
>  	/* Remove from existing buddy list */
>  	list_del(&zhdr->buddy);
>  
>  	if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> -		/* zbud page is empty, free */
>  		list_del(&zhdr->lru);
> -		free_zbud_page(zhdr);
>  		pool->pages_nr--;
>  	} else {
>  		/* Add to unbuddied list */
> -		freechunks = num_free_chunks(zhdr);
> +		int freechunks = num_free_chunks(zhdr);
>  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
>  	}
>  
> +	put_zbud_page(zhdr);
>  	spin_unlock(&pool->lock);
>  }
>  
> @@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
>   */
>  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
>  {
> -	int i, ret, freechunks;
> +	int i, ret;
>  	struct zbud_header *zhdr;
>  	unsigned long first_handle = 0, last_handle = 0;
>  
> @@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
>  		return -EINVAL;
>  	}
>  	for (i = 0; i < retries; i++) {
> +		if (list_empty(&pool->lru)) {
> +			/*
> +			 * LRU was emptied during evict calls in previous
> +			 * iteration but put_zbud_page() returned 0 meaning
> +			 * that someone still holds the page. This may
> +			 * happen when some other mm mechanism increased
> +			 * the page count.
> +			 * In such case we succedded with reclaim.
> +			 */
> +			spin_unlock(&pool->lock);
> +			return 0;
> +		}
>  		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> +		/* Move this last element to beginning of LRU */
>  		list_del(&zhdr->lru);
> -		list_del(&zhdr->buddy);
> +		list_add(&zhdr->lru, &pool->lru);
>  		/* Protect zbud page against free */
> -		zhdr->under_reclaim = true;
> +		get_zbud_page(zhdr);
>  		/*
>  		 * We need encode the handles before unlocking, since we can
>  		 * race with free that will set (first|last)_chunks to 0
> @@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
>  				goto next;
>  		}
>  next:
> -		spin_lock(&pool->lock);
> -		zhdr->under_reclaim = false;
> -		if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> -			/*
> -			 * Both buddies are now free, free the zbud page and
> -			 * return success.
> -			 */
> -			free_zbud_page(zhdr);
> -			pool->pages_nr--;
> -			spin_unlock(&pool->lock);
> +		if (put_zbud_page(zhdr))
>  			return 0;
> -		} else if (zhdr->first_chunks == 0 ||
> -				zhdr->last_chunks == 0) {
> -			/* add to unbuddied list */
> -			freechunks = num_free_chunks(zhdr);
> -			list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> -		} else {
> -			/* add to buddied list */
> -			list_add(&zhdr->buddy, &pool->buddied);
> -		}
> -
> -		/* add to beginning of LRU */
> -		list_add(&zhdr->lru, &pool->lru);
> +		spin_lock(&pool->lock);
>  	}
>  	spin_unlock(&pool->lock);
>  	return -EAGAIN;
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
@ 2013-09-08  9:04     ` Bob Liu
  0 siblings, 0 replies; 38+ messages in thread
From: Bob Liu @ 2013-09-08  9:04 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Seth Jennings, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Marek Szyprowski, Kyungmin Park,
	Dave Hansen, Minchan Kim, Tomasz Stanislawski

Hi Krzysztof,

On 08/30/2013 04:42 PM, Krzysztof Kozlowski wrote:
> Use page reference counter for zbud pages. The ref counter replaces
> zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> when zbud_free() is called during reclaim. It allows implementation of
> additional reclaim paths.
> 
> The page count is incremented when:
>  - a handle is created and passed to zswap (in zbud_alloc()),
>  - user-supplied eviction callback is called (in zbud_reclaim_page()).
> 
> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
> Reviewed-by: Bob Liu <bob.liu@oracle.com>

AFAIR, the previous version you sent out has a function  called
rebalance_lists() which I think is a good clean up.
But I didn't see that function any more in this version.

Thanks,
-Bob


> ---
>  mm/zbud.c |   97 +++++++++++++++++++++++++++++++++----------------------------
>  1 file changed, 52 insertions(+), 45 deletions(-)
> 
> diff --git a/mm/zbud.c b/mm/zbud.c
> index ad1e781..aa9a15c 100644
> --- a/mm/zbud.c
> +++ b/mm/zbud.c
> @@ -109,7 +109,6 @@ struct zbud_header {
>  	struct list_head lru;
>  	unsigned int first_chunks;
>  	unsigned int last_chunks;
> -	bool under_reclaim;
>  };
>  
>  /*****************
> @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
>  	zhdr->last_chunks = 0;
>  	INIT_LIST_HEAD(&zhdr->buddy);
>  	INIT_LIST_HEAD(&zhdr->lru);
> -	zhdr->under_reclaim = 0;
>  	return zhdr;
>  }
>  
> -/* Resets the struct page fields and frees the page */
> -static void free_zbud_page(struct zbud_header *zhdr)
> -{
> -	__free_page(virt_to_page(zhdr));
> -}
> -
>  /*
>   * Encodes the handle of a particular buddy within a zbud page
>   * Pool lock should be held as this function accesses first|last_chunks
> @@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
>  	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
>  }
>  
> +/*
> + * Increases ref count for zbud page.
> + */
> +static void get_zbud_page(struct zbud_header *zhdr)
> +{
> +	get_page(virt_to_page(zhdr));
> +}
> +
> +/*
> + * Decreases ref count for zbud page and frees the page if it reaches 0
> + * (no external references, e.g. handles).
> + *
> + * Returns 1 if page was freed and 0 otherwise.
> + */
> +static int put_zbud_page(struct zbud_header *zhdr)
> +{
> +	struct page *page = virt_to_page(zhdr);
> +	if (put_page_testzero(page)) {
> +		free_hot_cold_page(page, 0);
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +
>  /*****************
>   * API Functions
>  *****************/
> @@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
>  int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>  			unsigned long *handle)
>  {
> -	int chunks, i, freechunks;
> +	int chunks, i;
>  	struct zbud_header *zhdr = NULL;
>  	enum buddy bud;
>  	struct page *page;
> @@ -273,6 +290,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>  				bud = FIRST;
>  			else
>  				bud = LAST;
> +			get_zbud_page(zhdr);
>  			goto found;
>  		}
>  	}
> @@ -284,6 +302,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>  		return -ENOMEM;
>  	spin_lock(&pool->lock);
>  	pool->pages_nr++;
> +	/*
> +	 * We will be using zhdr instead of page, so
> +	 * don't increase the page count.
> +	 */
>  	zhdr = init_zbud_page(page);
>  	bud = FIRST;
>  
> @@ -295,7 +317,7 @@ found:
>  
>  	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
>  		/* Add to unbuddied list */
> -		freechunks = num_free_chunks(zhdr);
> +		int freechunks = num_free_chunks(zhdr);
>  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
>  	} else {
>  		/* Add to buddied list */
> @@ -326,7 +348,6 @@ found:
>  void zbud_free(struct zbud_pool *pool, unsigned long handle)
>  {
>  	struct zbud_header *zhdr;
> -	int freechunks;
>  
>  	spin_lock(&pool->lock);
>  	zhdr = handle_to_zbud_header(handle);
> @@ -337,26 +358,19 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
>  	else
>  		zhdr->first_chunks = 0;
>  
> -	if (zhdr->under_reclaim) {
> -		/* zbud page is under reclaim, reclaim will free */
> -		spin_unlock(&pool->lock);
> -		return;
> -	}
> -
>  	/* Remove from existing buddy list */
>  	list_del(&zhdr->buddy);
>  
>  	if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> -		/* zbud page is empty, free */
>  		list_del(&zhdr->lru);
> -		free_zbud_page(zhdr);
>  		pool->pages_nr--;
>  	} else {
>  		/* Add to unbuddied list */
> -		freechunks = num_free_chunks(zhdr);
> +		int freechunks = num_free_chunks(zhdr);
>  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
>  	}
>  
> +	put_zbud_page(zhdr);
>  	spin_unlock(&pool->lock);
>  }
>  
> @@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
>   */
>  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
>  {
> -	int i, ret, freechunks;
> +	int i, ret;
>  	struct zbud_header *zhdr;
>  	unsigned long first_handle = 0, last_handle = 0;
>  
> @@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
>  		return -EINVAL;
>  	}
>  	for (i = 0; i < retries; i++) {
> +		if (list_empty(&pool->lru)) {
> +			/*
> +			 * LRU was emptied during evict calls in previous
> +			 * iteration but put_zbud_page() returned 0 meaning
> +			 * that someone still holds the page. This may
> +			 * happen when some other mm mechanism increased
> +			 * the page count.
> +			 * In such case we succedded with reclaim.
> +			 */
> +			spin_unlock(&pool->lock);
> +			return 0;
> +		}
>  		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> +		/* Move this last element to beginning of LRU */
>  		list_del(&zhdr->lru);
> -		list_del(&zhdr->buddy);
> +		list_add(&zhdr->lru, &pool->lru);
>  		/* Protect zbud page against free */
> -		zhdr->under_reclaim = true;
> +		get_zbud_page(zhdr);
>  		/*
>  		 * We need encode the handles before unlocking, since we can
>  		 * race with free that will set (first|last)_chunks to 0
> @@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
>  				goto next;
>  		}
>  next:
> -		spin_lock(&pool->lock);
> -		zhdr->under_reclaim = false;
> -		if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> -			/*
> -			 * Both buddies are now free, free the zbud page and
> -			 * return success.
> -			 */
> -			free_zbud_page(zhdr);
> -			pool->pages_nr--;
> -			spin_unlock(&pool->lock);
> +		if (put_zbud_page(zhdr))
>  			return 0;
> -		} else if (zhdr->first_chunks == 0 ||
> -				zhdr->last_chunks == 0) {
> -			/* add to unbuddied list */
> -			freechunks = num_free_chunks(zhdr);
> -			list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> -		} else {
> -			/* add to buddied list */
> -			list_add(&zhdr->buddy, &pool->buddied);
> -		}
> -
> -		/* add to beginning of LRU */
> -		list_add(&zhdr->lru, &pool->lru);
> +		spin_lock(&pool->lock);
>  	}
>  	spin_unlock(&pool->lock);
>  	return -EAGAIN;
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
  2013-08-30  8:42 [RFC PATCH 0/4] mm: migrate zbud pages Krzysztof Kozlowski
@ 2013-08-30  8:42   ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-30  8:42 UTC (permalink / raw)
  To: Seth Jennings, linux-mm, linux-kernel, Andrew Morton
  Cc: Bob Liu, Mel Gorman, Bartlomiej Zolnierkiewicz, Marek Szyprowski,
	Kyungmin Park, Dave Hansen, Minchan Kim, Krzysztof Kozlowski,
	Tomasz Stanislawski

Use page reference counter for zbud pages. The ref counter replaces
zbud_header.under_reclaim flag and ensures that zbud page won't be freed
when zbud_free() is called during reclaim. It allows implementation of
additional reclaim paths.

The page count is incremented when:
 - a handle is created and passed to zswap (in zbud_alloc()),
 - user-supplied eviction callback is called (in zbud_reclaim_page()).

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
---
 mm/zbud.c |   97 +++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 52 insertions(+), 45 deletions(-)

diff --git a/mm/zbud.c b/mm/zbud.c
index ad1e781..aa9a15c 100644
--- a/mm/zbud.c
+++ b/mm/zbud.c
@@ -109,7 +109,6 @@ struct zbud_header {
 	struct list_head lru;
 	unsigned int first_chunks;
 	unsigned int last_chunks;
-	bool under_reclaim;
 };
 
 /*****************
@@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
 	zhdr->last_chunks = 0;
 	INIT_LIST_HEAD(&zhdr->buddy);
 	INIT_LIST_HEAD(&zhdr->lru);
-	zhdr->under_reclaim = 0;
 	return zhdr;
 }
 
-/* Resets the struct page fields and frees the page */
-static void free_zbud_page(struct zbud_header *zhdr)
-{
-	__free_page(virt_to_page(zhdr));
-}
-
 /*
  * Encodes the handle of a particular buddy within a zbud page
  * Pool lock should be held as this function accesses first|last_chunks
@@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
 	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
 }
 
+/*
+ * Increases ref count for zbud page.
+ */
+static void get_zbud_page(struct zbud_header *zhdr)
+{
+	get_page(virt_to_page(zhdr));
+}
+
+/*
+ * Decreases ref count for zbud page and frees the page if it reaches 0
+ * (no external references, e.g. handles).
+ *
+ * Returns 1 if page was freed and 0 otherwise.
+ */
+static int put_zbud_page(struct zbud_header *zhdr)
+{
+	struct page *page = virt_to_page(zhdr);
+	if (put_page_testzero(page)) {
+		free_hot_cold_page(page, 0);
+		return 1;
+	}
+	return 0;
+}
+
+
 /*****************
  * API Functions
 *****************/
@@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
 int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 			unsigned long *handle)
 {
-	int chunks, i, freechunks;
+	int chunks, i;
 	struct zbud_header *zhdr = NULL;
 	enum buddy bud;
 	struct page *page;
@@ -273,6 +290,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 				bud = FIRST;
 			else
 				bud = LAST;
+			get_zbud_page(zhdr);
 			goto found;
 		}
 	}
@@ -284,6 +302,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 		return -ENOMEM;
 	spin_lock(&pool->lock);
 	pool->pages_nr++;
+	/*
+	 * We will be using zhdr instead of page, so
+	 * don't increase the page count.
+	 */
 	zhdr = init_zbud_page(page);
 	bud = FIRST;
 
@@ -295,7 +317,7 @@ found:
 
 	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
 		/* Add to unbuddied list */
-		freechunks = num_free_chunks(zhdr);
+		int freechunks = num_free_chunks(zhdr);
 		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
 	} else {
 		/* Add to buddied list */
@@ -326,7 +348,6 @@ found:
 void zbud_free(struct zbud_pool *pool, unsigned long handle)
 {
 	struct zbud_header *zhdr;
-	int freechunks;
 
 	spin_lock(&pool->lock);
 	zhdr = handle_to_zbud_header(handle);
@@ -337,26 +358,19 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
 	else
 		zhdr->first_chunks = 0;
 
-	if (zhdr->under_reclaim) {
-		/* zbud page is under reclaim, reclaim will free */
-		spin_unlock(&pool->lock);
-		return;
-	}
-
 	/* Remove from existing buddy list */
 	list_del(&zhdr->buddy);
 
 	if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
-		/* zbud page is empty, free */
 		list_del(&zhdr->lru);
-		free_zbud_page(zhdr);
 		pool->pages_nr--;
 	} else {
 		/* Add to unbuddied list */
-		freechunks = num_free_chunks(zhdr);
+		int freechunks = num_free_chunks(zhdr);
 		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
 	}
 
+	put_zbud_page(zhdr);
 	spin_unlock(&pool->lock);
 }
 
@@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
  */
 int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
 {
-	int i, ret, freechunks;
+	int i, ret;
 	struct zbud_header *zhdr;
 	unsigned long first_handle = 0, last_handle = 0;
 
@@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
 		return -EINVAL;
 	}
 	for (i = 0; i < retries; i++) {
+		if (list_empty(&pool->lru)) {
+			/*
+			 * LRU was emptied during evict calls in previous
+			 * iteration but put_zbud_page() returned 0 meaning
+			 * that someone still holds the page. This may
+			 * happen when some other mm mechanism increased
+			 * the page count.
+			 * In such case we succedded with reclaim.
+			 */
+			spin_unlock(&pool->lock);
+			return 0;
+		}
 		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
+		/* Move this last element to beginning of LRU */
 		list_del(&zhdr->lru);
-		list_del(&zhdr->buddy);
+		list_add(&zhdr->lru, &pool->lru);
 		/* Protect zbud page against free */
-		zhdr->under_reclaim = true;
+		get_zbud_page(zhdr);
 		/*
 		 * We need encode the handles before unlocking, since we can
 		 * race with free that will set (first|last)_chunks to 0
@@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
 				goto next;
 		}
 next:
-		spin_lock(&pool->lock);
-		zhdr->under_reclaim = false;
-		if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
-			/*
-			 * Both buddies are now free, free the zbud page and
-			 * return success.
-			 */
-			free_zbud_page(zhdr);
-			pool->pages_nr--;
-			spin_unlock(&pool->lock);
+		if (put_zbud_page(zhdr))
 			return 0;
-		} else if (zhdr->first_chunks == 0 ||
-				zhdr->last_chunks == 0) {
-			/* add to unbuddied list */
-			freechunks = num_free_chunks(zhdr);
-			list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
-		} else {
-			/* add to buddied list */
-			list_add(&zhdr->buddy, &pool->buddied);
-		}
-
-		/* add to beginning of LRU */
-		list_add(&zhdr->lru, &pool->lru);
+		spin_lock(&pool->lock);
 	}
 	spin_unlock(&pool->lock);
 	return -EAGAIN;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
@ 2013-08-30  8:42   ` Krzysztof Kozlowski
  0 siblings, 0 replies; 38+ messages in thread
From: Krzysztof Kozlowski @ 2013-08-30  8:42 UTC (permalink / raw)
  To: Seth Jennings, linux-mm, linux-kernel, Andrew Morton
  Cc: Bob Liu, Mel Gorman, Bartlomiej Zolnierkiewicz, Marek Szyprowski,
	Kyungmin Park, Dave Hansen, Minchan Kim, Krzysztof Kozlowski,
	Tomasz Stanislawski

Use page reference counter for zbud pages. The ref counter replaces
zbud_header.under_reclaim flag and ensures that zbud page won't be freed
when zbud_free() is called during reclaim. It allows implementation of
additional reclaim paths.

The page count is incremented when:
 - a handle is created and passed to zswap (in zbud_alloc()),
 - user-supplied eviction callback is called (in zbud_reclaim_page()).

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
---
 mm/zbud.c |   97 +++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 52 insertions(+), 45 deletions(-)

diff --git a/mm/zbud.c b/mm/zbud.c
index ad1e781..aa9a15c 100644
--- a/mm/zbud.c
+++ b/mm/zbud.c
@@ -109,7 +109,6 @@ struct zbud_header {
 	struct list_head lru;
 	unsigned int first_chunks;
 	unsigned int last_chunks;
-	bool under_reclaim;
 };
 
 /*****************
@@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
 	zhdr->last_chunks = 0;
 	INIT_LIST_HEAD(&zhdr->buddy);
 	INIT_LIST_HEAD(&zhdr->lru);
-	zhdr->under_reclaim = 0;
 	return zhdr;
 }
 
-/* Resets the struct page fields and frees the page */
-static void free_zbud_page(struct zbud_header *zhdr)
-{
-	__free_page(virt_to_page(zhdr));
-}
-
 /*
  * Encodes the handle of a particular buddy within a zbud page
  * Pool lock should be held as this function accesses first|last_chunks
@@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
 	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
 }
 
+/*
+ * Increases ref count for zbud page.
+ */
+static void get_zbud_page(struct zbud_header *zhdr)
+{
+	get_page(virt_to_page(zhdr));
+}
+
+/*
+ * Decreases ref count for zbud page and frees the page if it reaches 0
+ * (no external references, e.g. handles).
+ *
+ * Returns 1 if page was freed and 0 otherwise.
+ */
+static int put_zbud_page(struct zbud_header *zhdr)
+{
+	struct page *page = virt_to_page(zhdr);
+	if (put_page_testzero(page)) {
+		free_hot_cold_page(page, 0);
+		return 1;
+	}
+	return 0;
+}
+
+
 /*****************
  * API Functions
 *****************/
@@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
 int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 			unsigned long *handle)
 {
-	int chunks, i, freechunks;
+	int chunks, i;
 	struct zbud_header *zhdr = NULL;
 	enum buddy bud;
 	struct page *page;
@@ -273,6 +290,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 				bud = FIRST;
 			else
 				bud = LAST;
+			get_zbud_page(zhdr);
 			goto found;
 		}
 	}
@@ -284,6 +302,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 		return -ENOMEM;
 	spin_lock(&pool->lock);
 	pool->pages_nr++;
+	/*
+	 * We will be using zhdr instead of page, so
+	 * don't increase the page count.
+	 */
 	zhdr = init_zbud_page(page);
 	bud = FIRST;
 
@@ -295,7 +317,7 @@ found:
 
 	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
 		/* Add to unbuddied list */
-		freechunks = num_free_chunks(zhdr);
+		int freechunks = num_free_chunks(zhdr);
 		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
 	} else {
 		/* Add to buddied list */
@@ -326,7 +348,6 @@ found:
 void zbud_free(struct zbud_pool *pool, unsigned long handle)
 {
 	struct zbud_header *zhdr;
-	int freechunks;
 
 	spin_lock(&pool->lock);
 	zhdr = handle_to_zbud_header(handle);
@@ -337,26 +358,19 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
 	else
 		zhdr->first_chunks = 0;
 
-	if (zhdr->under_reclaim) {
-		/* zbud page is under reclaim, reclaim will free */
-		spin_unlock(&pool->lock);
-		return;
-	}
-
 	/* Remove from existing buddy list */
 	list_del(&zhdr->buddy);
 
 	if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
-		/* zbud page is empty, free */
 		list_del(&zhdr->lru);
-		free_zbud_page(zhdr);
 		pool->pages_nr--;
 	} else {
 		/* Add to unbuddied list */
-		freechunks = num_free_chunks(zhdr);
+		int freechunks = num_free_chunks(zhdr);
 		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
 	}
 
+	put_zbud_page(zhdr);
 	spin_unlock(&pool->lock);
 }
 
@@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
  */
 int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
 {
-	int i, ret, freechunks;
+	int i, ret;
 	struct zbud_header *zhdr;
 	unsigned long first_handle = 0, last_handle = 0;
 
@@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
 		return -EINVAL;
 	}
 	for (i = 0; i < retries; i++) {
+		if (list_empty(&pool->lru)) {
+			/*
+			 * LRU was emptied during evict calls in previous
+			 * iteration but put_zbud_page() returned 0 meaning
+			 * that someone still holds the page. This may
+			 * happen when some other mm mechanism increased
+			 * the page count.
+			 * In such case we succedded with reclaim.
+			 */
+			spin_unlock(&pool->lock);
+			return 0;
+		}
 		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
+		/* Move this last element to beginning of LRU */
 		list_del(&zhdr->lru);
-		list_del(&zhdr->buddy);
+		list_add(&zhdr->lru, &pool->lru);
 		/* Protect zbud page against free */
-		zhdr->under_reclaim = true;
+		get_zbud_page(zhdr);
 		/*
 		 * We need encode the handles before unlocking, since we can
 		 * race with free that will set (first|last)_chunks to 0
@@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
 				goto next;
 		}
 next:
-		spin_lock(&pool->lock);
-		zhdr->under_reclaim = false;
-		if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
-			/*
-			 * Both buddies are now free, free the zbud page and
-			 * return success.
-			 */
-			free_zbud_page(zhdr);
-			pool->pages_nr--;
-			spin_unlock(&pool->lock);
+		if (put_zbud_page(zhdr))
 			return 0;
-		} else if (zhdr->first_chunks == 0 ||
-				zhdr->last_chunks == 0) {
-			/* add to unbuddied list */
-			freechunks = num_free_chunks(zhdr);
-			list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
-		} else {
-			/* add to buddied list */
-			list_add(&zhdr->buddy, &pool->buddied);
-		}
-
-		/* add to beginning of LRU */
-		list_add(&zhdr->lru, &pool->lru);
+		spin_lock(&pool->lock);
 	}
 	spin_unlock(&pool->lock);
 	return -EAGAIN;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2013-09-10  7:16 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-06  6:42 [RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction Krzysztof Kozlowski
2013-08-06  6:42 ` Krzysztof Kozlowski
2013-08-06  6:42 ` [RFC PATCH 1/4] zbud: use page ref counter for zbud pages Krzysztof Kozlowski
2013-08-06  6:42   ` Krzysztof Kozlowski
2013-08-06  9:00   ` Bob Liu
2013-08-06  9:00     ` Bob Liu
2013-08-06  9:25     ` Krzysztof Kozlowski
2013-08-06  9:25       ` Krzysztof Kozlowski
2013-08-06 18:51   ` Seth Jennings
2013-08-07  7:31     ` Krzysztof Kozlowski
2013-08-07  7:31       ` Krzysztof Kozlowski
2013-08-06  6:42 ` [RFC PATCH 2/4] mm: split code for unusing swap entries from try_to_unuse Krzysztof Kozlowski
2013-08-06  6:42   ` Krzysztof Kozlowski
2013-08-06  6:42 ` [RFC PATCH 3/4] mm: add zbud flag to page flags Krzysztof Kozlowski
2013-08-06  6:42   ` Krzysztof Kozlowski
2013-08-06 16:58   ` Dave Hansen
2013-08-06 16:58     ` Dave Hansen
2013-08-07  7:04     ` Krzysztof Kozlowski
2013-08-07  7:04       ` Krzysztof Kozlowski
2013-08-08  7:26     ` Krzysztof Kozlowski
2013-08-08  7:26       ` Krzysztof Kozlowski
2013-08-06 18:57   ` Seth Jennings
2013-08-06  6:42 ` [RFC PATCH 4/4] mm: reclaim zbud pages on migration and compaction Krzysztof Kozlowski
2013-08-06  6:42   ` Krzysztof Kozlowski
2013-08-06  9:16 ` [RFC PATCH 0/4] " Bob Liu
2013-08-06  9:16   ` Bob Liu
2013-08-06 13:05   ` Krzysztof Kozlowski
2013-08-06 13:05     ` Krzysztof Kozlowski
2013-08-30  8:42 [RFC PATCH 0/4] mm: migrate zbud pages Krzysztof Kozlowski
2013-08-30  8:42 ` [RFC PATCH 1/4] zbud: use page ref counter for " Krzysztof Kozlowski
2013-08-30  8:42   ` Krzysztof Kozlowski
2013-09-08  9:04   ` Bob Liu
2013-09-08  9:04     ` Bob Liu
2013-09-09  6:14     ` Krzysztof Kozlowski
2013-09-09  6:14       ` Krzysztof Kozlowski
2013-09-09 19:47   ` Seth Jennings
2013-09-09 19:47     ` Seth Jennings
2013-09-10  7:16     ` Krzysztof Kozlowski
2013-09-10  7:16       ` Krzysztof Kozlowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.