linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] seqlock: Add new blocking reader type & use rwlock
@ 2013-07-04  1:52 Waiman Long
  2013-07-04  1:52 ` [PATCH 1/4] seqlock: Add a new blocking reader type Waiman Long
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Waiman Long @ 2013-07-04  1:52 UTC (permalink / raw)
  To: Alexander Viro, Thomas Gleixner
  Cc: Waiman Long, linux-fsdevel, linux-kernel, Chandramouleeswaran,
	Aswin, Norton, Scott J

During some perf-record sessions of the kernel running the high_systime
workload of the AIM7 benchmark, it was found that quite a large portion
of the spinlock contention was due to the perf_event_mmap_event()
function itself. This perf kernel function calls d_path() which,
in turn, call path_get() and dput() indirectly. These 3 functions
were the hottest functions shown in the perf-report output of
the _raw_spin_lock() function in an 8-socket system with 80 cores
(hyperthreading off) with a 3.10-rc1 kernel:

-  13.91%           reaim  [kernel.kallsyms]     [k] _raw_spin_lock
   - _raw_spin_lock
      + 35.54% path_get
      + 34.85% dput
      + 19.49% d_path

In fact, the output of the "perf record -s -a" (without call-graph)
showed:

 13.37%           reaim  [kernel.kallsyms]     [k] _raw_spin_lock
  7.61%              ls  [kernel.kallsyms]     [k] _raw_spin_lock
  3.54%            true  [kernel.kallsyms]     [k] _raw_spin_lock

Without using the perf monitoring tool, the actual execution profile
will be quite different. In fact, with this patch set and my other
lockless reference count update patch applied, the output of the same
"perf record -s -a" command became:

  2.82%           reaim  [kernel.kallsyms]     [k] _raw_spin_lock
  1.11%              ls  [kernel.kallsyms]     [k] _raw_spin_lock
  0.26%            true  [kernel.kallsyms]     [k] _raw_spin_lock

So the time spent on _raw_spin_lock() function went down from 24.52%
to 4.19%. It can be seen that the performance data collected by the
perf-record command can be heavily skewed in some cases on a system
with a large number of CPUs. This set of patches enables the perf
command to give a more accurate and reliable picture of what is really
happening in the system. At the same time, they can also improve the
general performance of systems especially those with a large number
of CPUs.

The d_path() function takes the following two locks:

1. dentry->d_lock [spinlock] from dget()/dget_parent()/dput()
2. rename_lock    [seqlock]  from d_path()

This set of patches address the rename_lock bottleneck by changing the
way seqlock is implemented so that we can optionally use a read/write
lock as the underlying implementation instead of the default spinlock.

Incidentally, this patch also provides slight 5% performance boost
over just the the lockless reference count update patch in the short
workload of the AIM7 benchmark running on a 8-socket 80-core DL980
machine on a 3.10-based kernel. There were still a few percentage
points of contention in d_path() and getcwd syscall left due to their
use of the rename_lock.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>

Waiman Long (4):
  seqlock: Add a new blocking reader type
  dcache: Use blocking reader seqlock when protected data are not
    changed
  seqlock: Allow the use of rwlock in seqlock
  dcache: Use rwlock as the underlying lock in rename_lock

 fs/dcache.c             |   28 ++++----
 include/linux/seqlock.h |  167 ++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 158 insertions(+), 37 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/4] seqlock: Add a new blocking reader type
  2013-07-04  1:52 [PATCH 0/4] seqlock: Add new blocking reader type & use rwlock Waiman Long
@ 2013-07-04  1:52 ` Waiman Long
  2013-07-04  1:52 ` [PATCH 2/4] dcache: Use blocking reader seqlock when protected data are not changed Waiman Long
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Waiman Long @ 2013-07-04  1:52 UTC (permalink / raw)
  To: Alexander Viro, Thomas Gleixner
  Cc: Waiman Long, linux-fsdevel, linux-kernel, Chandramouleeswaran,
	Aswin, Norton, Scott J

The sequence lock (seqlock) was originally designed for the cases
where the readers do not need to block the writers by making the
readers retry the read operation when the data change.

Since then, the use cases have been expanded to include situations
where a thread does not need to change the data (effectively a reader)
but have to take the writer lock because it can't tolerate changes
to the underlying structure. Some examples are the d_path() function
and the getcwd() syscall in fs/dcache.c where the functions take
the writer lock on rename_lock even though they don't need to change
anything in the protected data structure at all. This is inefficient
as a reader is now blocking other non-blocking readers by pretending
to be a writer.

This patch tries to eliminate this efficiency by introducing a new
type of blocking reader to the seqlock locking mechanism. This new
blocking reader will not block other non-blocking readers, but will
block other blocking readers and writers.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 include/linux/seqlock.h |   65 +++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 1829905..26be0d9 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -3,15 +3,18 @@
 /*
  * Reader/writer consistent mechanism without starving writers. This type of
  * lock for data where the reader wants a consistent set of information
- * and is willing to retry if the information changes.  Readers never
- * block but they may have to retry if a writer is in
- * progress. Writers do not wait for readers. 
+ * and is willing to retry if the information changes. There are two types
+ * of readers:
+ * 1. Non-blocking readers which never block but they may have to retry if
+ *    a writer is in progress. Writers do not wait for non-blocking readers.
+ * 2. Blocking readers which will block if a writer is in progress. A
+ *    blocking reader in progress will also block a writer.
  *
- * This is not as cache friendly as brlock. Also, this will not work
+ * This is not as cache friendly as brlock. Also, this may not work well
  * for data that contains pointers, because any writer could
  * invalidate a pointer that a reader was following.
  *
- * Expected reader usage:
+ * Expected non-blocking reader usage:
  * 	do {
  *	    seq = read_seqbegin(&foo);
  * 	...
@@ -268,4 +271,56 @@ write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
+/*
+ * The blocking reader lock out other writers, but doesn't update the count.
+ * Acts like a normal spin_lock/unlock.
+ * Don't need preempt_disable() because that is in the spin_lock already.
+ */
+static inline void read_seqlock(seqlock_t *sl)
+{
+	spin_lock(&sl->lock);
+}
+
+static inline void read_sequnlock(seqlock_t *sl)
+{
+	spin_unlock(&sl->lock);
+}
+
+static inline void read_seqlock_bh(seqlock_t *sl)
+{
+	spin_lock_bh(&sl->lock);
+}
+
+static inline void read_sequnlock_bh(seqlock_t *sl)
+{
+	spin_unlock_bh(&sl->lock);
+}
+
+static inline void read_seqlock_irq(seqlock_t *sl)
+{
+	spin_lock_irq(&sl->lock);
+}
+
+static inline void read_sequnlock_irq(seqlock_t *sl)
+{
+	spin_unlock_irq(&sl->lock);
+}
+
+static inline unsigned long __read_seqlock_irqsave(seqlock_t *sl)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&sl->lock, flags);
+	return flags;
+}
+
+#define read_seqlock_irqsave(lock, flags)				\
+	do { flags = __read_seqlock_irqsave(lock); } while (0)
+
+static inline void
+read_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
+{
+	spin_unlock_irqrestore(&sl->lock, flags);
+}
+
 #endif /* __LINUX_SEQLOCK_H */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/4] dcache: Use blocking reader seqlock when protected data are not changed
  2013-07-04  1:52 [PATCH 0/4] seqlock: Add new blocking reader type & use rwlock Waiman Long
  2013-07-04  1:52 ` [PATCH 1/4] seqlock: Add a new blocking reader type Waiman Long
@ 2013-07-04  1:52 ` Waiman Long
  2013-07-04  1:52 ` [PATCH 3/4] seqlock: Allow the use of rwlock in seqlock Waiman Long
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Waiman Long @ 2013-07-04  1:52 UTC (permalink / raw)
  To: Alexander Viro, Thomas Gleixner
  Cc: Waiman Long, linux-fsdevel, linux-kernel, Chandramouleeswaran,
	Aswin, Norton, Scott J

The following functions in the fs/dcache.c file are changed to use
the blocking reader seqlock as none of the dentries are being changed
by those functions. This will allow other non-blocking readers to
proceed when the blocking readers have taken the lock.

1. __d_path
2. d_absolute_path
3. d_path
4. dentry_path_raw
5. dentry_path
6. getcwd syscall

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 fs/dcache.c |   26 +++++++++++++-------------
 1 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f09b908..480c81f 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2603,9 +2603,9 @@ char *__d_path(const struct path *path,
 
 	prepend(&res, &buflen, "\0", 1);
 	br_read_lock(&vfsmount_lock);
-	write_seqlock(&rename_lock);
+	read_seqlock(&rename_lock);
 	error = prepend_path(path, root, &res, &buflen);
-	write_sequnlock(&rename_lock);
+	read_sequnlock(&rename_lock);
 	br_read_unlock(&vfsmount_lock);
 
 	if (error < 0)
@@ -2624,9 +2624,9 @@ char *d_absolute_path(const struct path *path,
 
 	prepend(&res, &buflen, "\0", 1);
 	br_read_lock(&vfsmount_lock);
-	write_seqlock(&rename_lock);
+	read_seqlock(&rename_lock);
 	error = prepend_path(path, &root, &res, &buflen);
-	write_sequnlock(&rename_lock);
+	read_sequnlock(&rename_lock);
 	br_read_unlock(&vfsmount_lock);
 
 	if (error > 1)
@@ -2692,9 +2692,9 @@ char *d_path(const struct path *path, char *buf, int buflen)
 
 	get_fs_root(current->fs, &root);
 	br_read_lock(&vfsmount_lock);
-	write_seqlock(&rename_lock);
+	read_seqlock(&rename_lock);
 	error = path_with_deleted(path, &root, &res, &buflen);
-	write_sequnlock(&rename_lock);
+	read_sequnlock(&rename_lock);
 	br_read_unlock(&vfsmount_lock);
 	if (error < 0)
 		res = ERR_PTR(error);
@@ -2762,9 +2762,9 @@ char *dentry_path_raw(struct dentry *dentry, char *buf, int buflen)
 {
 	char *retval;
 
-	write_seqlock(&rename_lock);
+	read_seqlock(&rename_lock);
 	retval = __dentry_path(dentry, buf, buflen);
-	write_sequnlock(&rename_lock);
+	read_sequnlock(&rename_lock);
 
 	return retval;
 }
@@ -2775,7 +2775,7 @@ char *dentry_path(struct dentry *dentry, char *buf, int buflen)
 	char *p = NULL;
 	char *retval;
 
-	write_seqlock(&rename_lock);
+	read_seqlock(&rename_lock);
 	if (d_unlinked(dentry)) {
 		p = buf + buflen;
 		if (prepend(&p, &buflen, "//deleted", 10) != 0)
@@ -2783,7 +2783,7 @@ char *dentry_path(struct dentry *dentry, char *buf, int buflen)
 		buflen++;
 	}
 	retval = __dentry_path(dentry, buf, buflen);
-	write_sequnlock(&rename_lock);
+	read_sequnlock(&rename_lock);
 	if (!IS_ERR(retval) && p)
 		*p = '/';	/* restore '/' overriden with '\0' */
 	return retval;
@@ -2822,7 +2822,7 @@ SYSCALL_DEFINE2(getcwd, char __user *, buf, unsigned long, size)
 
 	error = -ENOENT;
 	br_read_lock(&vfsmount_lock);
-	write_seqlock(&rename_lock);
+	read_seqlock(&rename_lock);
 	if (!d_unlinked(pwd.dentry)) {
 		unsigned long len;
 		char *cwd = page + PAGE_SIZE;
@@ -2830,7 +2830,7 @@ SYSCALL_DEFINE2(getcwd, char __user *, buf, unsigned long, size)
 
 		prepend(&cwd, &buflen, "\0", 1);
 		error = prepend_path(&pwd, &root, &cwd, &buflen);
-		write_sequnlock(&rename_lock);
+		read_sequnlock(&rename_lock);
 		br_read_unlock(&vfsmount_lock);
 
 		if (error < 0)
@@ -2851,7 +2851,7 @@ SYSCALL_DEFINE2(getcwd, char __user *, buf, unsigned long, size)
 				error = -EFAULT;
 		}
 	} else {
-		write_sequnlock(&rename_lock);
+		read_sequnlock(&rename_lock);
 		br_read_unlock(&vfsmount_lock);
 	}
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/4] seqlock: Allow the use of rwlock in seqlock
  2013-07-04  1:52 [PATCH 0/4] seqlock: Add new blocking reader type & use rwlock Waiman Long
  2013-07-04  1:52 ` [PATCH 1/4] seqlock: Add a new blocking reader type Waiman Long
  2013-07-04  1:52 ` [PATCH 2/4] dcache: Use blocking reader seqlock when protected data are not changed Waiman Long
@ 2013-07-04  1:52 ` Waiman Long
  2013-07-04  1:52 ` [PATCH 4/4] dcache: Use rwlock as the underlying lock in rename_lock Waiman Long
  2013-08-23 22:49 ` [PATCH 0/4] seqlock: Add new blocking reader type & use rwlock Waiman Long
  4 siblings, 0 replies; 6+ messages in thread
From: Waiman Long @ 2013-07-04  1:52 UTC (permalink / raw)
  To: Alexander Viro, Thomas Gleixner
  Cc: Waiman Long, linux-fsdevel, linux-kernel, Chandramouleeswaran,
	Aswin, Norton, Scott J

For the use cases where there are much more blocking readers than
writers, it will be beneficial performance-wise to use read/write lock
instead of a spinlock. However, read/write lock is non-deterministic
and can be problematic in some situations. So a complete conversion
of the underlying lock in seqlock to read/write lock will not be
appropriate.

This patch allows a seqlock user to decide to use either spinlock or
read/write lock as the underlying lock at initialization time. Once
the decision is made, it cannot be changed at a later time. To use an
underlying read/write lock, either the seqrwlock_init() function or
the DEFINE_SEQRWLOCK() macro have to be used at initialization time.
There is a slight overhead of an additional conditional branch with
that change, but it should be insignificant when compared with the
overhead of the actual locking and unlocking operations.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 include/linux/seqlock.h |  118 ++++++++++++++++++++++++++++++++++++----------
 1 files changed, 92 insertions(+), 26 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 26be0d9..a1fd45c 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -20,7 +20,6 @@
  * 	...
  *      } while (read_seqretry(&foo, seq));
  *
- *
  * On non-SMP the spin locks disappear but the writer still needs
  * to increment the sequence variables because an interrupt routine could
  * change the state of the data.
@@ -176,28 +175,51 @@ static inline void write_seqcount_barrier(seqcount_t *s)
 
 typedef struct {
 	struct seqcount seqcount;
-	spinlock_t lock;
+	const bool use_rwlock;
+	union {
+		spinlock_t slock;
+		rwlock_t rwlock;
+	};
 } seqlock_t;
 
 /*
  * These macros triggered gcc-3.x compile-time problems.  We think these are
  * OK now.  Be cautious.
  */
-#define __SEQLOCK_UNLOCKED(lockname)			\
-	{						\
-		.seqcount = SEQCNT_ZERO,		\
-		.lock =	__SPIN_LOCK_UNLOCKED(lockname)	\
+#define __SEQLOCK_UNLOCKED(lockname)				\
+	{							\
+		.seqcount = SEQCNT_ZERO,			\
+		.use_rwlock = false,				\
+		{ .slock = __SPIN_LOCK_UNLOCKED(lockname) }	\
+	}
+
+#define __SEQRWLOCK_UNLOCKED(lockname)				\
+	{							\
+		.seqcount = SEQCNT_ZERO,			\
+		.use_rwlock = true,				\
+		{ .rwlock = __RW_LOCK_UNLOCKED(lockname) }	\
 	}
 
-#define seqlock_init(x)					\
-	do {						\
-		seqcount_init(&(x)->seqcount);		\
-		spin_lock_init(&(x)->lock);		\
+#define seqlock_init(x)						\
+	do {							\
+		seqcount_init(&(x)->seqcount);			\
+		spin_lock_init(&(x)->slock);			\
+		*(bool *)(&(x)->use_rwlock) = false;		\
+	} while (0)
+
+#define seqrwlock_init(x)					\
+	do {							\
+		seqcount_init(&(x)->seqcount);			\
+		rwlock_init(&(x)->rwlock);			\
+		*(bool *)(&(x)->use_rwlock) = true;		\
 	} while (0)
 
 #define DEFINE_SEQLOCK(x) \
 		seqlock_t x = __SEQLOCK_UNLOCKED(x)
 
+#define DEFINE_SEQRWLOCK(x) \
+		seqlock_t x = __SEQRWLOCK_UNLOCKED(x)
+
 /*
  * Read side functions for starting and finalizing a read side section.
  */
@@ -212,51 +234,86 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 }
 
 /*
+ * Locking and unlocking macros
+ */
+#define	__SEQRLOCK(sl, suffix)					\
+	do {							\
+		if ((sl)->use_rwlock)				\
+			read_lock ## suffix(&(sl)->rwlock);	\
+		else						\
+			spin_lock ## suffix(&(sl)->slock);	\
+	} while (0)
+#define	__SEQWLOCK(sl, suffix)					\
+	do {							\
+		if ((sl)->use_rwlock)				\
+			write_lock ## suffix(&(sl)->rwlock);	\
+		else						\
+			spin_lock ## suffix(&(sl)->slock);	\
+	} while (0)
+#define	__SEQRUNLOCK(sl, suffix)				\
+	do {							\
+		if ((sl)->use_rwlock)				\
+			read_unlock ## suffix(&(sl)->rwlock);	\
+		else						\
+			spin_unlock ## suffix(&(sl)->slock);	\
+	} while (0)
+#define	__SEQWUNLOCK(sl, suffix)				\
+	do {							\
+		if ((sl)->use_rwlock)				\
+			write_unlock ## suffix(&(sl)->rwlock);	\
+		else						\
+			spin_unlock ## suffix(&(sl)->slock);	\
+	} while (0)
+
+/*
  * Lock out other writers and update the count.
  * Acts like a normal spin_lock/unlock.
  * Don't need preempt_disable() because that is in the spin_lock already.
  */
 static inline void write_seqlock(seqlock_t *sl)
 {
-	spin_lock(&sl->lock);
+	__SEQWLOCK(sl, /**/);
 	write_seqcount_begin(&sl->seqcount);
 }
 
 static inline void write_sequnlock(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
-	spin_unlock(&sl->lock);
+	__SEQWUNLOCK(sl, /**/);
 }
 
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
-	spin_lock_bh(&sl->lock);
+	__SEQWLOCK(sl, _bh);
 	write_seqcount_begin(&sl->seqcount);
 }
 
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
-	spin_unlock_bh(&sl->lock);
+	__SEQWUNLOCK(sl, _bh);
 }
 
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
-	spin_lock_irq(&sl->lock);
+	__SEQWLOCK(sl, _irq);
 	write_seqcount_begin(&sl->seqcount);
 }
 
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
-	spin_unlock_irq(&sl->lock);
+	__SEQWUNLOCK(sl, _irq);
 }
 
 static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&sl->lock, flags);
+	if (sl->use_rwlock)
+		write_lock_irqsave(&sl->rwlock, flags);
+	else
+		spin_lock_irqsave(&sl->slock, flags);
 	write_seqcount_begin(&sl->seqcount);
 	return flags;
 }
@@ -268,7 +325,10 @@ static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
 	write_seqcount_end(&sl->seqcount);
-	spin_unlock_irqrestore(&sl->lock, flags);
+	if (sl->use_rwlock)
+		write_unlock_irqrestore(&sl->rwlock, flags);
+	else
+		spin_unlock_irqrestore(&sl->slock, flags);
 }
 
 /*
@@ -278,39 +338,42 @@ write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
  */
 static inline void read_seqlock(seqlock_t *sl)
 {
-	spin_lock(&sl->lock);
+	__SEQRLOCK(sl, /**/);
 }
 
 static inline void read_sequnlock(seqlock_t *sl)
 {
-	spin_unlock(&sl->lock);
+	__SEQRUNLOCK(sl, /**/);
 }
 
 static inline void read_seqlock_bh(seqlock_t *sl)
 {
-	spin_lock_bh(&sl->lock);
+	__SEQRLOCK(sl, _bh);
 }
 
 static inline void read_sequnlock_bh(seqlock_t *sl)
 {
-	spin_unlock_bh(&sl->lock);
+	__SEQRUNLOCK(sl, _bh);
 }
 
 static inline void read_seqlock_irq(seqlock_t *sl)
 {
-	spin_lock_irq(&sl->lock);
+	__SEQRLOCK(sl, _irq);
 }
 
 static inline void read_sequnlock_irq(seqlock_t *sl)
 {
-	spin_unlock_irq(&sl->lock);
+	__SEQRUNLOCK(sl, _irq);
 }
 
 static inline unsigned long __read_seqlock_irqsave(seqlock_t *sl)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&sl->lock, flags);
+	if (sl->use_rwlock)
+		read_lock_irqsave(&sl->rwlock, flags);
+	else
+		spin_lock_irqsave(&sl->slock, flags);
 	return flags;
 }
 
@@ -320,7 +383,10 @@ static inline unsigned long __read_seqlock_irqsave(seqlock_t *sl)
 static inline void
 read_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
-	spin_unlock_irqrestore(&sl->lock, flags);
+	if (sl->use_rwlock)
+		read_unlock_irqrestore(&sl->rwlock, flags);
+	else
+		spin_unlock_irqrestore(&sl->slock, flags);
 }
 
 #endif /* __LINUX_SEQLOCK_H */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/4] dcache: Use rwlock as the underlying lock in rename_lock
  2013-07-04  1:52 [PATCH 0/4] seqlock: Add new blocking reader type & use rwlock Waiman Long
                   ` (2 preceding siblings ...)
  2013-07-04  1:52 ` [PATCH 3/4] seqlock: Allow the use of rwlock in seqlock Waiman Long
@ 2013-07-04  1:52 ` Waiman Long
  2013-08-23 22:49 ` [PATCH 0/4] seqlock: Add new blocking reader type & use rwlock Waiman Long
  4 siblings, 0 replies; 6+ messages in thread
From: Waiman Long @ 2013-07-04  1:52 UTC (permalink / raw)
  To: Alexander Viro, Thomas Gleixner
  Cc: Waiman Long, linux-fsdevel, linux-kernel, Chandramouleeswaran,
	Aswin, Norton, Scott J

This patch converts the underlying lock in rename_lock from spinlock
to a read/write lock. This allows multiple blocking readers to proceed
concurrently which is not possible with a spinlock implementation.

As contention of the rename_lock in the d_path() function is a
bottleneck when the perf tool is used to record performance data on
a large SMP system with many cores, converting rename_lock to use
read/write lock will eliminate this bottleneck and the skewing of
perf trace data.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 fs/dcache.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 480c81f..75299cc 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -82,7 +82,7 @@ int sysctl_vfs_cache_pressure __read_mostly = 100;
 EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
 
 static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lru_lock);
-__cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
+__cacheline_aligned_in_smp DEFINE_SEQRWLOCK(rename_lock);
 
 EXPORT_SYMBOL(rename_lock);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/4] seqlock: Add new blocking reader type & use rwlock
  2013-07-04  1:52 [PATCH 0/4] seqlock: Add new blocking reader type & use rwlock Waiman Long
                   ` (3 preceding siblings ...)
  2013-07-04  1:52 ` [PATCH 4/4] dcache: Use rwlock as the underlying lock in rename_lock Waiman Long
@ 2013-08-23 22:49 ` Waiman Long
  4 siblings, 0 replies; 6+ messages in thread
From: Waiman Long @ 2013-08-23 22:49 UTC (permalink / raw)
  To: Alexander Viro, Thomas Gleixner
  Cc: Waiman Long, linux-fsdevel, linux-kernel, Chandramouleeswaran,
	Aswin, Norton, Scott J

On 07/03/2013 09:52 PM, Waiman Long wrote:
> During some perf-record sessions of the kernel running the high_systime
> workload of the AIM7 benchmark, it was found that quite a large portion
> of the spinlock contention was due to the perf_event_mmap_event()
> function itself. This perf kernel function calls d_path() which,
> in turn, call path_get() and dput() indirectly. These 3 functions
> were the hottest functions shown in the perf-report output of
> the _raw_spin_lock() function in an 8-socket system with 80 cores
> (hyperthreading off) with a 3.10-rc1 kernel:
>
> -  13.91%           reaim  [kernel.kallsyms]     [k] _raw_spin_lock
>     - _raw_spin_lock
>        + 35.54% path_get
>        + 34.85% dput
>        + 19.49% d_path
>
> In fact, the output of the "perf record -s -a" (without call-graph)
> showed:
>
>   13.37%           reaim  [kernel.kallsyms]     [k] _raw_spin_lock
>    7.61%              ls  [kernel.kallsyms]     [k] _raw_spin_lock
>    3.54%            true  [kernel.kallsyms]     [k] _raw_spin_lock
>
> Without using the perf monitoring tool, the actual execution profile
> will be quite different. In fact, with this patch set and my other
> lockless reference count update patch applied, the output of the same
> "perf record -s -a" command became:
>
>    2.82%           reaim  [kernel.kallsyms]     [k] _raw_spin_lock
>    1.11%              ls  [kernel.kallsyms]     [k] _raw_spin_lock
>    0.26%            true  [kernel.kallsyms]     [k] _raw_spin_lock
>
> So the time spent on _raw_spin_lock() function went down from 24.52%
> to 4.19%. It can be seen that the performance data collected by the
> perf-record command can be heavily skewed in some cases on a system
> with a large number of CPUs. This set of patches enables the perf
> command to give a more accurate and reliable picture of what is really
> happening in the system. At the same time, they can also improve the
> general performance of systems especially those with a large number
> of CPUs.
>
> The d_path() function takes the following two locks:
>
> 1. dentry->d_lock [spinlock] from dget()/dget_parent()/dput()
> 2. rename_lock    [seqlock]  from d_path()
>
> This set of patches address the rename_lock bottleneck by changing the
> way seqlock is implemented so that we can optionally use a read/write
> lock as the underlying implementation instead of the default spinlock.
>
> Incidentally, this patch also provides slight 5% performance boost
> over just the the lockless reference count update patch in the short
> workload of the AIM7 benchmark running on a 8-socket 80-core DL980
> machine on a 3.10-based kernel. There were still a few percentage
> points of contention in d_path() and getcwd syscall left due to their
> use of the rename_lock.
>
> Signed-off-by: Waiman Long<Waiman.Long@hp.com>
>
> Waiman Long (4):
>    seqlock: Add a new blocking reader type
>    dcache: Use blocking reader seqlock when protected data are not
>      changed
>    seqlock: Allow the use of rwlock in seqlock
>    dcache: Use rwlock as the underlying lock in rename_lock
>
>   fs/dcache.c             |   28 ++++----
>   include/linux/seqlock.h |  167 ++++++++++++++++++++++++++++++++++++++++-------
>   2 files changed, 158 insertions(+), 37 deletions(-)
>

I haven't received any feedback on this patchset. Would you mind letting 
me know if any further change will be needed to make it acceptable to be 
merged?

Thank,
Longman

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-08-23 22:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-04  1:52 [PATCH 0/4] seqlock: Add new blocking reader type & use rwlock Waiman Long
2013-07-04  1:52 ` [PATCH 1/4] seqlock: Add a new blocking reader type Waiman Long
2013-07-04  1:52 ` [PATCH 2/4] dcache: Use blocking reader seqlock when protected data are not changed Waiman Long
2013-07-04  1:52 ` [PATCH 3/4] seqlock: Allow the use of rwlock in seqlock Waiman Long
2013-07-04  1:52 ` [PATCH 4/4] dcache: Use rwlock as the underlying lock in rename_lock Waiman Long
2013-08-23 22:49 ` [PATCH 0/4] seqlock: Add new blocking reader type & use rwlock Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).