All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Jackson <pj@sgi.com>
To: akpm@osdl.org
Cc: linux-kernel@vger.kernel.org,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Simon Derr <Simon.Derr@bull.net>, Andi Kleen <ak@suse.de>,
	Paul Jackson <pj@sgi.com>, Christoph Lameter <clameter@sgi.com>
Subject: [PATCH] Cpuset: rcu optimization of page alloc hook
Date: Sun, 11 Dec 2005 15:31:30 -0800 (PST)	[thread overview]
Message-ID: <20051211233130.18000.2748.sendpatchset@jackhammer.engr.sgi.com> (raw)

Optimize the cpuset impact on page allocation, the most
performance critical cpuset hook in the kernel.

On each page allocation, the cpuset hook needs to check for a
possible change in the current tasks cpuset.  It can now handle
the common case, of no change, without taking any spinlock or
semaphore, thanks to RCU.

Convert a spinlock on the current task to an rcu_read_lock(),
saving approximately a memory barrier and an atomic op, depending
on architecture.

This is done by putting struct cpusets in a slab cache marked
SLAB_DESTROY_BY_RCU, so that an rcu_read_lock can ensure that
the slab won't free the memory behind a cpuset struct.

Thanks to Andi Kleen and Nick Piggin for the suggestion.

Signed-off-by: Paul Jackson <pj@sgi.com>

---

 kernel/cpuset.c |   42 +++++++++++++++++++++++++++++++-----------
 1 files changed, 31 insertions(+), 11 deletions(-)

--- 2.6.15-rc3-mm1.orig/kernel/cpuset.c	2005-12-11 13:31:16.925866678 -0800
+++ 2.6.15-rc3-mm1/kernel/cpuset.c	2005-12-11 14:26:57.184694603 -0800
@@ -63,6 +63,18 @@
  */
 int number_of_cpusets;
 
+/*
+ * Take cpusets from a slab cache marked SLAB_DESTROY_BY_RCU, so
+ * that on the critical code path cpuset_update_task_memory_state(),
+ * we can safely read task->cpuset->mems_generation with just
+ * an rcu lock, w/o risk of a slab cache free invalidating that
+ * memory location.  It might not be our tasks cpuset anymore,
+ * but at least it will be safe to load from it, which is good
+ * enough when checking for a changed mems_generation.
+ */
+
+static kmem_cache_t *cpuset_cache;
+
 /* See "Frequency meter" comments, below. */
 
 struct fmeter {
@@ -248,6 +260,10 @@ static struct super_block *cpuset_sb;
  * a tasks cpuset pointer we use task_lock(), which acts on a spinlock
  * (task->alloc_lock) already in the task_struct routinely used for
  * such matters.
+ *
+ * P.S.  One more locking exception.  See the use of rcu_read_lock
+ * when checking a tasks cpuset->mems_generation in the routine
+ * cpuset_update_task_memory_state below.
  */
 
 static DECLARE_MUTEX(manage_sem);
@@ -289,7 +305,7 @@ static void cpuset_diput(struct dentry *
 	if (S_ISDIR(inode->i_mode)) {
 		struct cpuset *cs = dentry->d_fsdata;
 		BUG_ON(!(is_removed(cs)));
-		kfree(cs);
+		kmem_cache_free(cpuset_cache, cs);
 	}
 	iput(inode);
 }
@@ -612,12 +628,11 @@ static void guarantee_online_mems(const 
  * cpuset pointer.  This routine also might acquire callback_sem and
  * current->mm->mmap_sem during call.
  *
- * The task_lock() is required to dereference current->cpuset safely.
- * Without it, we could pick up the pointer value of current->cpuset
- * in one instruction, and then attach_task could give us a different
- * cpuset, and then the cpuset we had could be removed and freed,
- * and then on our next instruction, we could dereference a no longer
- * valid cpuset pointer to get its mems_generation field.
+ * Reading current->cpuset->mems_generation doesn't need task_lock,
+ * because cpusets are on a slab cache marked SLAB_DESTROY_BY_RCU,
+ * so the rcu_read_lock() insures the memory read will not yet be
+ * unmapped.  It's ok if attach_task() replaces our cpuset with
+ * another while we are reading mems_generation, and even frees it.
  *
  * This routine is needed to update the per-task mems_allowed data,
  * within the tasks context, when it is trying to allocate memory
@@ -634,9 +649,9 @@ void cpuset_update_task_memory_state()
 	if (unlikely(!cs))
 		return;
 
-	task_lock(tsk);
+	rcu_read_lock();
 	my_cpusets_mem_gen = cs->mems_generation;
-	task_unlock(tsk);
+	rcu_read_unlock();
 
 	if (my_cpusets_mem_gen != tsk->cpuset_mems_generation) {
 		down(&callback_sem);
@@ -1742,7 +1757,7 @@ static long cpuset_create(struct cpuset 
 	struct cpuset *cs;
 	int err;
 
-	cs = kmalloc(sizeof(*cs), GFP_KERNEL);
+	cs = kmem_cache_alloc(cpuset_cache, GFP_KERNEL);
 	if (!cs)
 		return -ENOMEM;
 
@@ -1784,7 +1799,7 @@ static long cpuset_create(struct cpuset 
 err:
 	list_del(&cs->sibling);
 	up(&manage_sem);
-	kfree(cs);
+	kmem_cache_free(cpuset_cache, cs);
 	return err;
 }
 
@@ -1847,6 +1862,11 @@ int __init cpuset_init(void)
 	struct dentry *root;
 	int err;
 
+	cpuset_cache = kmem_cache_create("cpuset_cache",
+					sizeof(struct cpuset), 0,
+					SLAB_DESTROY_BY_RCU | SLAB_PANIC,
+					NULL, NULL);
+
 	top_cpuset.cpus_allowed = CPU_MASK_ALL;
 	top_cpuset.mems_allowed = NODE_MASK_ALL;
 

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

             reply	other threads:[~2005-12-11 23:31 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-11 23:31 Paul Jackson [this message]
2005-12-12  3:29 ` [PATCH] Cpuset: rcu optimization of page alloc hook Andi Kleen
2005-12-12  6:11   ` Paul Jackson
2005-12-12  6:21     ` Andi Kleen
2005-12-12  6:50       ` Paul Jackson
2005-12-12  8:49 ` Eric Dumazet
2005-12-12  8:54   ` Nick Piggin
2005-12-12  9:06     ` Eric Dumazet
2005-12-12  9:11     ` Andrew Morton
2005-12-12  9:38       ` Nick Piggin
2005-12-12 10:02   ` Paul Jackson
2005-12-12 10:12     ` Andrew Morton
2005-12-13 15:53       ` Paul Jackson
2005-12-13 16:31         ` Eric Dumazet
2005-12-13 17:42           ` Christoph Lameter
2005-12-13 17:56             ` Eric Dumazet
2005-12-13 18:07               ` Christoph Lameter
2005-12-13 21:03               ` Paul Jackson
2005-12-13 21:16                 ` Christoph Lameter
2005-12-13 21:38                 ` Eric Dumazet
2005-12-13 22:23                   ` Paul Jackson
2005-12-13 22:29                     ` Christoph Lameter
2005-12-14  3:54                     ` Paul Jackson
2005-12-14  4:02                       ` Andi Kleen
2005-12-14  4:06                         ` Paul Jackson
2005-12-14  8:06                     ` Eric Dumazet
2005-12-14  8:40                       ` Paul Jackson
2005-12-13 20:08           ` Paul Jackson
2005-12-13 20:29             ` Eric Dumazet
2005-12-13 22:35               ` Paul Jackson
2005-12-13 21:44           ` Paul Jackson
2005-12-13 17:37         ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051211233130.18000.2748.sendpatchset@jackhammer.engr.sgi.com \
    --to=pj@sgi.com \
    --cc=Simon.Derr@bull.net \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=clameter@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.