linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/RFC] z3fold: use per-page read/write lock
@ 2016-11-05 13:49 Vitaly Wool
  2016-11-05 23:38 ` Andi Kleen
  2016-11-10 19:38 ` Peter Zijlstra
  0 siblings, 2 replies; 5+ messages in thread
From: Vitaly Wool @ 2016-11-05 13:49 UTC (permalink / raw)
  To: Linux-MM, linux-kernel; +Cc: Dan Streetman, Andrew Morton


Most of z3fold operations are in-page, such as modifying z3fold
page header or moving z3fold objects within a page. Taking
per-pool spinlock to protect per-page objects is therefore
suboptimal, and the idea of having a per-page spinlock (or rwlock)
has been around for some time. However, adding one directly to the
z3fold header makes the latter quite big on some systems so that
it won't fit in a signle chunk.

This patch implements custom per-page read/write locking mechanism
which is lightweight enough to fit into the z3fold header.

Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
---
 mm/z3fold.c | 148 ++++++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 105 insertions(+), 43 deletions(-)

diff --git a/mm/z3fold.c b/mm/z3fold.c
index fea6791..3e30930 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -23,6 +23,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/atomic.h>
+#include <linux/bitops.h>
 #include <linux/list.h>
 #include <linux/mm.h>
 #include <linux/module.h>
@@ -98,6 +99,7 @@ enum buddy {
  * struct z3fold_header - z3fold page metadata occupying the first chunk of each
  *			z3fold page, except for HEADLESS pages
  * @buddy:	links the z3fold page into the relevant list in the pool
+ * @page_lock:		per-page atomic variable used for locking
  * @first_chunks:	the size of the first buddy in chunks, 0 if free
  * @middle_chunks:	the size of the middle buddy in chunks, 0 if free
  * @last_chunks:	the size of the last buddy in chunks, 0 if free
@@ -105,6 +107,7 @@ enum buddy {
  */
 struct z3fold_header {
 	struct list_head buddy;
+	atomic_t page_lock;
 	unsigned short first_chunks;
 	unsigned short middle_chunks;
 	unsigned short last_chunks;
@@ -144,6 +147,7 @@ static struct z3fold_header *init_z3fold_page(struct page *page)
 	clear_bit(PAGE_HEADLESS, &page->private);
 	clear_bit(MIDDLE_CHUNK_MAPPED, &page->private);
 
+	atomic_set(&zhdr->page_lock, 0);
 	zhdr->first_chunks = 0;
 	zhdr->middle_chunks = 0;
 	zhdr->last_chunks = 0;
@@ -159,6 +163,39 @@ static void free_z3fold_page(struct z3fold_header *zhdr)
 	__free_page(virt_to_page(zhdr));
 }
 
+#define Z3FOLD_PAGE_WRITE_FLAG	(1 << 15)
+
+/* Read-lock a z3fold page */
+static void z3fold_page_rlock(struct z3fold_header *zhdr)
+{
+	while (!atomic_add_unless(&zhdr->page_lock, 1, Z3FOLD_PAGE_WRITE_FLAG))
+		cpu_relax();
+	smp_mb();
+}
+
+/* Read-unlock a z3fold page */
+static void z3fold_page_runlock(struct z3fold_header *zhdr)
+{
+	atomic_dec(&zhdr->page_lock);
+	smp_mb();
+}
+
+/* Write-lock a z3fold page */
+static void z3fold_page_wlock(struct z3fold_header *zhdr)
+{
+	while (atomic_cmpxchg(&zhdr->page_lock, 0, Z3FOLD_PAGE_WRITE_FLAG) != 0)
+		cpu_relax();
+	smp_mb();
+}
+
+/* Write-unlock a z3fold page */
+static void z3fold_page_wunlock(struct z3fold_header *zhdr)
+{
+	atomic_sub(Z3FOLD_PAGE_WRITE_FLAG, &zhdr->page_lock);
+	smp_mb();
+}
+
+
 /*
  * Encodes the handle of a particular buddy within a z3fold page
  * Pool lock should be held as this function accesses first_num
@@ -343,50 +380,60 @@ static int z3fold_alloc(struct z3fold_pool *pool, size_t size, gfp_t gfp,
 		bud = HEADLESS;
 	else {
 		chunks = size_to_chunks(size);
-		spin_lock(&pool->lock);
 
 		/* First, try to find an unbuddied z3fold page. */
 		zhdr = NULL;
 		for_each_unbuddied_list(i, chunks) {
-			if (!list_empty(&pool->unbuddied[i])) {
-				zhdr = list_first_entry(&pool->unbuddied[i],
+			spin_lock(&pool->lock);
+			zhdr = list_first_entry_or_null(&pool->unbuddied[i],
 						struct z3fold_header, buddy);
-				page = virt_to_page(zhdr);
-				if (zhdr->first_chunks == 0) {
-					if (zhdr->middle_chunks != 0 &&
-					    chunks >= zhdr->start_middle)
-						bud = LAST;
-					else
-						bud = FIRST;
-				} else if (zhdr->last_chunks == 0)
+			if (!zhdr) {
+				spin_unlock(&pool->lock);
+				continue;
+			}
+			list_del(&zhdr->buddy);
+			spin_unlock(&pool->lock);
+
+			page = virt_to_page(zhdr);
+			z3fold_page_wlock(zhdr);
+			if (zhdr->first_chunks == 0) {
+				if (zhdr->middle_chunks != 0 &&
+				    chunks >= zhdr->start_middle)
 					bud = LAST;
-				else if (zhdr->middle_chunks == 0)
-					bud = MIDDLE;
-				else {
-					pr_err("No free chunks in unbuddied\n");
-					WARN_ON(1);
-					continue;
-				}
-				list_del(&zhdr->buddy);
-				goto found;
+				else
+					bud = FIRST;
+			} else if (zhdr->last_chunks == 0)
+				bud = LAST;
+			else if (zhdr->middle_chunks == 0)
+				bud = MIDDLE;
+			else {
+				spin_lock(&pool->lock);
+				list_add(&zhdr->buddy, &pool->buddied);
+				spin_unlock(&pool->lock);
+				z3fold_page_wunlock(zhdr);
+				pr_err("No free chunks in unbuddied\n");
+				WARN_ON(1);
+				continue;
 			}
+			goto found;
 		}
 		bud = FIRST;
-		spin_unlock(&pool->lock);
 	}
 
 	/* Couldn't find unbuddied z3fold page, create new one */
 	page = alloc_page(gfp);
 	if (!page)
 		return -ENOMEM;
-	spin_lock(&pool->lock);
+
 	atomic64_inc(&pool->pages_nr);
 	zhdr = init_z3fold_page(page);
 
 	if (bud == HEADLESS) {
 		set_bit(PAGE_HEADLESS, &page->private);
+		spin_lock(&pool->lock);
 		goto headless;
 	}
+	z3fold_page_wlock(zhdr);
 
 found:
 	if (bud == FIRST)
@@ -398,6 +445,7 @@ static int z3fold_alloc(struct z3fold_pool *pool, size_t size, gfp_t gfp,
 		zhdr->start_middle = zhdr->first_chunks + 1;
 	}
 
+	spin_lock(&pool->lock);
 	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
 			zhdr->middle_chunks == 0) {
 		/* Add to unbuddied list */
@@ -417,6 +465,8 @@ static int z3fold_alloc(struct z3fold_pool *pool, size_t size, gfp_t gfp,
 
 	*handle = encode_handle(zhdr, bud);
 	spin_unlock(&pool->lock);
+	if (bud != HEADLESS)
+		z3fold_page_wunlock(zhdr);
 
 	return 0;
 }
@@ -437,9 +487,7 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle)
 	int freechunks;
 	struct page *page;
 	enum buddy bud;
-	bool is_unbuddied = false;
 
-	spin_lock(&pool->lock);
 	zhdr = handle_to_z3fold_header(handle);
 	page = virt_to_page(zhdr);
 
@@ -447,10 +495,7 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle)
 		/* HEADLESS page stored */
 		bud = HEADLESS;
 	} else {
-		is_unbuddied = zhdr->first_chunks == 0 ||
-				zhdr->middle_chunks == 0 ||
-				zhdr->last_chunks == 0;
-
+		z3fold_page_wlock(zhdr);
 		bud = handle_to_buddy(handle);
 
 		switch (bud) {
@@ -467,37 +512,47 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle)
 		default:
 			pr_err("%s: unknown bud %d\n", __func__, bud);
 			WARN_ON(1);
-			spin_unlock(&pool->lock);
+			z3fold_page_wunlock(zhdr);
 			return;
 		}
 	}
 
 	if (test_bit(UNDER_RECLAIM, &page->private)) {
 		/* z3fold page is under reclaim, reclaim will free */
-		spin_unlock(&pool->lock);
+		if (bud != HEADLESS)
+			z3fold_page_wunlock(zhdr);
 		return;
 	}
 
 	/* Remove from existing buddy list */
-	if (bud != HEADLESS)
+	if (bud != HEADLESS) {
+		spin_lock(&pool->lock);
 		list_del(&zhdr->buddy);
+		spin_unlock(&pool->lock);
+	}
 
 	if (bud == HEADLESS ||
 	    (zhdr->first_chunks == 0 && zhdr->middle_chunks == 0 &&
 			zhdr->last_chunks == 0)) {
 		/* z3fold page is empty, free */
+		spin_lock(&pool->lock);
 		list_del(&page->lru);
+		spin_unlock(&pool->lock);
 		clear_bit(PAGE_HEADLESS, &page->private);
+		if (bud != HEADLESS)
+			z3fold_page_wunlock(zhdr);
 		free_z3fold_page(zhdr);
 		atomic64_dec(&pool->pages_nr);
 	} else {
 		z3fold_compact_page(zhdr);
 		/* Add to the unbuddied list */
+		spin_lock(&pool->lock);
 		freechunks = num_free_chunks(zhdr);
 		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
+		spin_unlock(&pool->lock);
+		z3fold_page_wunlock(zhdr);
 	}
 
-	spin_unlock(&pool->lock);
 }
 
 /**
@@ -558,6 +613,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
 		zhdr = page_address(page);
 		if (!test_bit(PAGE_HEADLESS, &page->private)) {
 			list_del(&zhdr->buddy);
+			spin_unlock(&pool->lock);
+			z3fold_page_rlock(zhdr);
 
 			/*
 			 * We need encode the handles before unlocking, since
@@ -573,13 +630,13 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
 				middle_handle = encode_handle(zhdr, MIDDLE);
 			if (zhdr->last_chunks)
 				last_handle = encode_handle(zhdr, LAST);
+			z3fold_page_runlock(zhdr);
 		} else {
 			first_handle = encode_handle(zhdr, HEADLESS);
 			last_handle = middle_handle = 0;
+			spin_unlock(&pool->lock);
 		}
 
-		spin_unlock(&pool->lock);
-
 		/* Issue the eviction callback(s) */
 		if (middle_handle) {
 			ret = pool->ops->evict(pool, middle_handle);
@@ -597,7 +654,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
 				goto next;
 		}
 next:
-		spin_lock(&pool->lock);
+		if (!test_bit(PAGE_HEADLESS, &page->private))
+			z3fold_page_wlock(zhdr);
 		clear_bit(UNDER_RECLAIM, &page->private);
 		if ((test_bit(PAGE_HEADLESS, &page->private) && ret == 0) ||
 		    (zhdr->first_chunks == 0 && zhdr->last_chunks == 0 &&
@@ -607,19 +665,22 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
 			 * return success.
 			 */
 			clear_bit(PAGE_HEADLESS, &page->private);
+			if (!test_bit(PAGE_HEADLESS, &page->private))
+				z3fold_page_wunlock(zhdr);
 			free_z3fold_page(zhdr);
 			atomic64_dec(&pool->pages_nr);
-			spin_unlock(&pool->lock);
 			return 0;
 		}  else if (!test_bit(PAGE_HEADLESS, &page->private)) {
 			if (zhdr->first_chunks != 0 &&
 			    zhdr->last_chunks != 0 &&
 			    zhdr->middle_chunks != 0) {
 				/* Full, add to buddied list */
+				spin_lock(&pool->lock);
 				list_add(&zhdr->buddy, &pool->buddied);
 			} else {
 				z3fold_compact_page(zhdr);
 				/* add to unbuddied list */
+				spin_lock(&pool->lock);
 				freechunks = num_free_chunks(zhdr);
 				list_add(&zhdr->buddy,
 					 &pool->unbuddied[freechunks]);
@@ -630,6 +691,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
 		list_add(&page->lru, &pool->lru);
 	}
 	spin_unlock(&pool->lock);
+	if (!test_bit(PAGE_HEADLESS, &page->private))
+		z3fold_page_wunlock(zhdr);
 	return -EAGAIN;
 }
 
@@ -650,7 +713,6 @@ static void *z3fold_map(struct z3fold_pool *pool, unsigned long handle)
 	void *addr;
 	enum buddy buddy;
 
-	spin_lock(&pool->lock);
 	zhdr = handle_to_z3fold_header(handle);
 	addr = zhdr;
 	page = virt_to_page(zhdr);
@@ -658,6 +720,7 @@ static void *z3fold_map(struct z3fold_pool *pool, unsigned long handle)
 	if (test_bit(PAGE_HEADLESS, &page->private))
 		goto out;
 
+	z3fold_page_rlock(zhdr);
 	buddy = handle_to_buddy(handle);
 	switch (buddy) {
 	case FIRST:
@@ -676,8 +739,9 @@ static void *z3fold_map(struct z3fold_pool *pool, unsigned long handle)
 		addr = NULL;
 		break;
 	}
+
+	z3fold_page_runlock(zhdr);
 out:
-	spin_unlock(&pool->lock);
 	return addr;
 }
 
@@ -692,19 +756,17 @@ static void z3fold_unmap(struct z3fold_pool *pool, unsigned long handle)
 	struct page *page;
 	enum buddy buddy;
 
-	spin_lock(&pool->lock);
 	zhdr = handle_to_z3fold_header(handle);
 	page = virt_to_page(zhdr);
 
-	if (test_bit(PAGE_HEADLESS, &page->private)) {
-		spin_unlock(&pool->lock);
+	if (test_bit(PAGE_HEADLESS, &page->private))
 		return;
-	}
 
+	z3fold_page_rlock(zhdr);
 	buddy = handle_to_buddy(handle);
 	if (buddy == MIDDLE)
 		clear_bit(MIDDLE_CHUNK_MAPPED, &page->private);
-	spin_unlock(&pool->lock);
+	z3fold_page_runlock(zhdr);
 }
 
 /**
-- 
2.4.2

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH/RFC] z3fold: use per-page read/write lock
  2016-11-05 13:49 [PATCH/RFC] z3fold: use per-page read/write lock Vitaly Wool
@ 2016-11-05 23:38 ` Andi Kleen
  2016-11-06  9:42   ` Vitaly Wool
  2016-11-10 19:38 ` Peter Zijlstra
  1 sibling, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2016-11-05 23:38 UTC (permalink / raw)
  To: Vitaly Wool; +Cc: Linux-MM, linux-kernel, Dan Streetman, Andrew Morton

Vitaly Wool <vitalywool@gmail.com> writes:

> Most of z3fold operations are in-page, such as modifying z3fold
> page header or moving z3fold objects within a page. Taking
> per-pool spinlock to protect per-page objects is therefore
> suboptimal, and the idea of having a per-page spinlock (or rwlock)
> has been around for some time. However, adding one directly to the
> z3fold header makes the latter quite big on some systems so that
> it won't fit in a signle chunk.

> +	atomic_t page_lock;

This doesnt make much sense. A standard spinlock is not bigger
than 4 bytes either. Also reinventing locks is usually a bad
idea: they are tricky to get right, you have no debugging support,
hard to analyze, etc.

-Andi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH/RFC] z3fold: use per-page read/write lock
  2016-11-05 23:38 ` Andi Kleen
@ 2016-11-06  9:42   ` Vitaly Wool
  2016-11-07 16:06     ` Andi Kleen
  0 siblings, 1 reply; 5+ messages in thread
From: Vitaly Wool @ 2016-11-06  9:42 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Linux-MM, LKML, Dan Streetman, Andrew Morton

On Sun, Nov 6, 2016 at 12:38 AM, Andi Kleen <andi@firstfloor.org> wrote:
> Vitaly Wool <vitalywool@gmail.com> writes:
>
>> Most of z3fold operations are in-page, such as modifying z3fold
>> page header or moving z3fold objects within a page. Taking
>> per-pool spinlock to protect per-page objects is therefore
>> suboptimal, and the idea of having a per-page spinlock (or rwlock)
>> has been around for some time. However, adding one directly to the
>> z3fold header makes the latter quite big on some systems so that
>> it won't fit in a signle chunk.
>
>> +     atomic_t page_lock;
>
> This doesnt make much sense. A standard spinlock is not bigger
> than 4 bytes either. Also reinventing locks is usually a bad
> idea: they are tricky to get right, you have no debugging support,
> hard to analyze, etc.

I understand the reinvention part but you're not quite accurate here
with the numbers.

E. g. on x86_64:
(gdb) p sizeof(rwlock_t)
$1 = 8

I believe a DIY lock is justified here, since the variant with
rwlock_t actually caused complaints from kbuild test robot building
the previous version of this patch [1] with gcc-6.0 for x86_64:

 In file included from arch/x86/include/asm/atomic.h:4:0,
                    from include/linux/atomic.h:4,
                    from mm/z3fold.c:25:
   mm/z3fold.c: In function 'init_z3fold':
>> include/linux/compiler.h:518:38: error: call to '__compiletime_assert_808' declared with attribute error: BUILD_BUG_ON failed: sizeof(struct z3fold_header) > ZHDR_SIZE_ALIGNED

~vitaly

[1] https://patchwork.kernel.org/patch/9384871/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH/RFC] z3fold: use per-page read/write lock
  2016-11-06  9:42   ` Vitaly Wool
@ 2016-11-07 16:06     ` Andi Kleen
  0 siblings, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2016-11-07 16:06 UTC (permalink / raw)
  To: Vitaly Wool; +Cc: Andi Kleen, Linux-MM, LKML, Dan Streetman, Andrew Morton

> I understand the reinvention part but you're not quite accurate here
> with the numbers.
> 
> E. g. on x86_64:
> (gdb) p sizeof(rwlock_t)
> $1 = 8

I was talking about spinlocks which are 4 bytes.  Just use a spinlock then. 
rwlocks are usually a bad idea anyways because they often scale far worse than
spinlocks due to the bad cache line bouncing behavior, and it doesn't
make much difference unless your critical section is very long.

-Andi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH/RFC] z3fold: use per-page read/write lock
  2016-11-05 13:49 [PATCH/RFC] z3fold: use per-page read/write lock Vitaly Wool
  2016-11-05 23:38 ` Andi Kleen
@ 2016-11-10 19:38 ` Peter Zijlstra
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2016-11-10 19:38 UTC (permalink / raw)
  To: Vitaly Wool
  Cc: Linux-MM, linux-kernel, Dan Streetman, Andrew Morton, Andi Kleen

On Sat, Nov 05, 2016 at 02:49:46PM +0100, Vitaly Wool wrote:
> +/* Read-lock a z3fold page */
> +static void z3fold_page_rlock(struct z3fold_header *zhdr)
> +{
> +	while (!atomic_add_unless(&zhdr->page_lock, 1, Z3FOLD_PAGE_WRITE_FLAG))
> +		cpu_relax();
> +	smp_mb();
> +}
> +
> +/* Read-unlock a z3fold page */
> +static void z3fold_page_runlock(struct z3fold_header *zhdr)
> +{
> +	atomic_dec(&zhdr->page_lock);
> +	smp_mb();
> +}
> +
> +/* Write-lock a z3fold page */
> +static void z3fold_page_wlock(struct z3fold_header *zhdr)
> +{
> +	while (atomic_cmpxchg(&zhdr->page_lock, 0, Z3FOLD_PAGE_WRITE_FLAG) != 0)
> +		cpu_relax();
> +	smp_mb();
> +}
> +
> +/* Write-unlock a z3fold page */
> +static void z3fold_page_wunlock(struct z3fold_header *zhdr)
> +{
> +	atomic_sub(Z3FOLD_PAGE_WRITE_FLAG, &zhdr->page_lock);
> +	smp_mb();
> +}

This is trivially broken. What Andi said, don't roll your own locks.


The unlocks want: smp_mb__before_atomic() _before_ the atomic for
'obvious' reasons.

Also, this lock has serious starvation issues and is reader biased.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-11-10 19:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-05 13:49 [PATCH/RFC] z3fold: use per-page read/write lock Vitaly Wool
2016-11-05 23:38 ` Andi Kleen
2016-11-06  9:42   ` Vitaly Wool
2016-11-07 16:06     ` Andi Kleen
2016-11-10 19:38 ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).