All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/1] dcache: Translating dentry into pathname without taking rename_lock
@ 2013-09-06 16:08 Waiman Long
  2013-09-06 16:08 ` [PATCH v3 1/1] " Waiman Long
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2013-09-06 16:08 UTC (permalink / raw)
  To: Alexander Viro, Linus Torvalds
  Cc: Waiman Long, linux-fsdevel, linux-kernel, Chandramouleeswaran,
	Aswin, Norton, Scott J, George Spelvin, John Stoffel

Change History

v2->v3:
 - Simplify prepend_name() to blindly copy the dname over until it
   reaches a null byte or the specified length leaving the sequence
   check to handle error case.

v1->v2:
 - Check for internal vs external dname, taking d_lock only for
   external dname for safety.
 - Replace memchr() by a byte-by-byte checking for loop.
 - Try lockless dentry to pathname conversion 3 times before falling
   back to taking the rename_lock to prevent live-lock.
 - Make code re-factoring suggested by George Spelvin.

Waiman Long (1):
  dcache: Translating dentry into pathname without taking rename_lock

 fs/dcache.c |  178 ++++++++++++++++++++++++++++++++++++++--------------------
 1 files changed, 116 insertions(+), 62 deletions(-)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-06 16:08 [PATCH v3 0/1] dcache: Translating dentry into pathname without taking rename_lock Waiman Long
@ 2013-09-06 16:08 ` Waiman Long
  2013-09-06 20:52   ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2013-09-06 16:08 UTC (permalink / raw)
  To: Alexander Viro, Linus Torvalds
  Cc: Waiman Long, linux-fsdevel, linux-kernel, Chandramouleeswaran,
	Aswin, Norton, Scott J, George Spelvin, John Stoffel

When running the AIM7's short workload, Linus' lockref patch eliminated
most of the spinlock contention. However, there were still some left:

     8.46%     reaim  [kernel.kallsyms]     [k] _raw_spin_lock
                 |--42.21%-- d_path
                 |          proc_pid_readlink
                 |          SyS_readlinkat
                 |          SyS_readlink
                 |          system_call
                 |          __GI___readlink
                 |
                 |--40.97%-- sys_getcwd
                 |          system_call
                 |          __getcwd

The big one here is the rename_lock (seqlock) contention in d_path()
and the getcwd system call. This patch will eliminate the need to take
the rename_lock while translating dentries into the full pathnames.

The need to take the rename_lock is to make sure that no rename
operation can be ongoing while the translation is in progress. However,
only one thread can take the rename_lock thus blocking all the other
threads that need it even though the translation process won't make
any change to the dentries.

This patch will replace the writer's write_seqlock/write_sequnlock
sequence of the rename_lock of the callers of the prepend_path() and
__dentry_path() functions with the reader's read_seqbegin/read_seqretry
sequence within these 2 functions. As a result, the code will have to
retry if one or more rename operations had been performed. In addition,
RCU read lock will be taken during the translation process to make sure
that no dentries will go away. To prevent live-lock from happening,
the code will switch back to take the rename_lock if read_seqretry()
fails for three times.

To further reduce spinlock contention, this patch does not take the
dentry's d_lock when copying the filename from the dentries. Instead,
it treats the name pointer and length as unreliable and just copy
the string byte-by-byte over until it hits a null byte or the end of
string as specified by the length. This should avoid stepping into
invalid memory address. The error cases are left to be handled by
the sequence number check.

The following code re-factoring are also made:
1. Move prepend('/') into prepend_name() to remove one conditional
   check.
2. Move the global root check in prepend_path() back to the top of
   the while loop.

With this patch, the _raw_spin_lock will now account for only 1.2%
of the total CPU cycles for the short workload. This patch also has
the effect of reducing the effect of running perf on its profile
since the perf command itself can be a heavy user of the d_path()
function depending on the complexity of the workload.

When taking the perf profile of the high-systime workload, the amount
of spinlock contention contributed by running perf without this patch
was about 16%. With this patch, the spinlock contention caused by
the running of perf will go away and we will have a more accurate
perf profile.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 fs/dcache.c |  178 ++++++++++++++++++++++++++++++++++++++--------------------
 1 files changed, 116 insertions(+), 62 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 96655f4..da095ce 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -89,6 +89,12 @@ EXPORT_SYMBOL(rename_lock);
 static struct kmem_cache *dentry_cache __read_mostly;
 
 /*
+ * Try 3 times of lockless dentry to pathname conversion before falling
+ * back to take the rename_lock.
+ */
+#define	D_LOCKLESS_PREPEND_RETRY	3
+
+/*
  * This is the single most critical data structure when it comes
  * to the dcache: the hashtable for lookups. Somebody should try
  * to make this good - I've just made it work.
@@ -2512,7 +2518,33 @@ static int prepend(char **buffer, int *buflen, const char *str, int namelen)
 
 static int prepend_name(char **buffer, int *buflen, struct qstr *name)
 {
-	return prepend(buffer, buflen, name->name, name->len);
+	/*
+	 * With RCU path tracing, it may race with d_move(). Use
+	 * ACCESS_ONCE() to make sure that either the old or the new name
+	 * pointer and length are fetched. However, there may be mismatch
+	 * between length and pointer. The length cannot be trusted, we need
+	 * to copy it byte-by-byte until the length is reached or a null
+	 * byte is found. It also prepends "/" at the beginning of the name.
+	 * The sequence number check at the caller will retry it again when
+	 * a d_move() does happen. So any garbage in the buffer due to
+	 * mismatched pointer and length will be discarded.
+	 */
+	const char *dname = ACCESS_ONCE(name->name);
+	u32 dlen = ACCESS_ONCE(name->len);
+	char *p;
+
+	if (*buflen < dlen + 1)
+		return -ENAMETOOLONG;
+	*buflen -= dlen + 1;
+	p = *buffer -= dlen + 1;
+	*p++ = '/';
+	while (dlen--) {
+		char c = *dname++;
+		if (!c)
+			break;
+		*p++ = c;
+	}
+	return 0;
 }
 
 /**
@@ -2522,7 +2554,14 @@ static int prepend_name(char **buffer, int *buflen, struct qstr *name)
  * @buffer: pointer to the end of the buffer
  * @buflen: pointer to buffer length
  *
- * Caller holds the rename_lock.
+ * The function tries to write out the pathname without taking any lock other
+ * than the RCU read lock to make sure that dentries won't go away. It only
+ * checks the sequence number of the global rename_lock as any change in the
+ * dentry's d_seq will be preceded by changes in the rename_lock sequence
+ * number. If the sequence number had been change, it will restart the whole
+ * pathname back-tracing sequence again. It performs a total of 3 trials of
+ * lockless back-tracing sequences before falling back to take the
+ * rename_lock.
  */
 static int prepend_path(const struct path *path,
 			const struct path *root,
@@ -2531,54 +2570,70 @@ static int prepend_path(const struct path *path,
 	struct dentry *dentry = path->dentry;
 	struct vfsmount *vfsmnt = path->mnt;
 	struct mount *mnt = real_mount(vfsmnt);
-	bool slash = false;
 	int error = 0;
+	unsigned seq = 0;
+	char *bptr;
+	int blen;
+	int retry_cnt = D_LOCKLESS_PREPEND_RETRY;
 
+restart:
+	bptr = *buffer;
+	blen = *buflen;
+	if (retry_cnt) {
+		seq = read_seqbegin(&rename_lock);
+		rcu_read_lock();
+	} else
+		write_seqlock(&rename_lock);
 	while (dentry != root->dentry || vfsmnt != root->mnt) {
 		struct dentry * parent;
 
 		if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
 			/* Global root? */
-			if (!mnt_has_parent(mnt))
-				goto global_root;
-			dentry = mnt->mnt_mountpoint;
-			mnt = mnt->mnt_parent;
-			vfsmnt = &mnt->mnt;
-			continue;
+			if (mnt_has_parent(mnt)) {
+				dentry = mnt->mnt_mountpoint;
+				mnt = mnt->mnt_parent;
+				vfsmnt = &mnt->mnt;
+				continue;
+			}
+			/*
+			 * Filesystems needing to implement special "root names"
+			 * should do so with ->d_dname()
+			 */
+			if (IS_ROOT(dentry) &&
+			   (dentry->d_name.len != 1 ||
+			    dentry->d_name.name[0] != '/')) {
+				WARN(1, "Root dentry has weird name <%.*s>\n",
+				     (int) dentry->d_name.len,
+				     dentry->d_name.name);
+			}
+			if (!error)
+				error = is_mounted(vfsmnt) ? 1 : 2;
+			break;
 		}
 		parent = dentry->d_parent;
 		prefetch(parent);
-		spin_lock(&dentry->d_lock);
-		error = prepend_name(buffer, buflen, &dentry->d_name);
-		spin_unlock(&dentry->d_lock);
-		if (!error)
-			error = prepend(buffer, buflen, "/", 1);
+		error = prepend_name(&bptr, &blen, &dentry->d_name);
 		if (error)
 			break;
 
-		slash = true;
 		dentry = parent;
 	}
+	if (retry_cnt) {
+		retry_cnt--;
+		rcu_read_unlock();
+		if (read_seqretry(&rename_lock, seq))
+			goto restart;
+	} else
+		write_sequnlock(&rename_lock);
 
-	if (!error && !slash)
-		error = prepend(buffer, buflen, "/", 1);
-
-	return error;
-
-global_root:
-	/*
-	 * Filesystems needing to implement special "root names"
-	 * should do so with ->d_dname()
-	 */
-	if (IS_ROOT(dentry) &&
-	    (dentry->d_name.len != 1 || dentry->d_name.name[0] != '/')) {
-		WARN(1, "Root dentry has weird name <%.*s>\n",
-		     (int) dentry->d_name.len, dentry->d_name.name);
-	}
-	if (!slash)
-		error = prepend(buffer, buflen, "/", 1);
-	if (!error)
-		error = is_mounted(vfsmnt) ? 1 : 2;
+	if (error >= 0 && bptr == *buffer) {
+		if (--blen < 0)
+			error = -ENAMETOOLONG;
+		else
+			*--bptr = '/';
+	}
+	*buffer = bptr;
+	*buflen = blen;
 	return error;
 }
 
@@ -2607,9 +2662,7 @@ char *__d_path(const struct path *path,
 
 	prepend(&res, &buflen, "\0", 1);
 	br_read_lock(&vfsmount_lock);
-	write_seqlock(&rename_lock);
 	error = prepend_path(path, root, &res, &buflen);
-	write_sequnlock(&rename_lock);
 	br_read_unlock(&vfsmount_lock);
 
 	if (error < 0)
@@ -2628,9 +2681,7 @@ char *d_absolute_path(const struct path *path,
 
 	prepend(&res, &buflen, "\0", 1);
 	br_read_lock(&vfsmount_lock);
-	write_seqlock(&rename_lock);
 	error = prepend_path(path, &root, &res, &buflen);
-	write_sequnlock(&rename_lock);
 	br_read_unlock(&vfsmount_lock);
 
 	if (error > 1)
@@ -2696,9 +2747,7 @@ char *d_path(const struct path *path, char *buf, int buflen)
 
 	get_fs_root(current->fs, &root);
 	br_read_lock(&vfsmount_lock);
-	write_seqlock(&rename_lock);
 	error = path_with_deleted(path, &root, &res, &buflen);
-	write_sequnlock(&rename_lock);
 	br_read_unlock(&vfsmount_lock);
 	if (error < 0)
 		res = ERR_PTR(error);
@@ -2733,10 +2782,10 @@ char *simple_dname(struct dentry *dentry, char *buffer, int buflen)
 	char *end = buffer + buflen;
 	/* these dentries are never renamed, so d_lock is not needed */
 	if (prepend(&end, &buflen, " (deleted)", 11) ||
-	    prepend_name(&end, &buflen, &dentry->d_name) ||
+	    prepend(&end, &buflen, dentry->d_name.name, dentry->d_name.len) ||
 	    prepend(&end, &buflen, "/", 1))  
 		end = ERR_PTR(-ENAMETOOLONG);
-	return end;  
+	return end;
 }
 
 /*
@@ -2744,30 +2793,46 @@ char *simple_dname(struct dentry *dentry, char *buffer, int buflen)
  */
 static char *__dentry_path(struct dentry *dentry, char *buf, int buflen)
 {
-	char *end = buf + buflen;
-	char *retval;
+	char *end, *retval;
+	int len, seq = 0;
+	int error = 0;
+	int retry_cnt = D_LOCKLESS_PREPEND_RETRY;
 
-	prepend(&end, &buflen, "\0", 1);
+restart:
+	end = buf + buflen;
+	len = buflen;
+	prepend(&end, &len, "\0", 1);
 	if (buflen < 1)
 		goto Elong;
 	/* Get '/' right */
 	retval = end-1;
 	*retval = '/';
-
+	if (retry_cnt) {
+		seq = read_seqbegin(&rename_lock);
+		rcu_read_lock();
+	} else
+		write_seqlock(&rename_lock);
 	while (!IS_ROOT(dentry)) {
 		struct dentry *parent = dentry->d_parent;
 		int error;
 
 		prefetch(parent);
-		spin_lock(&dentry->d_lock);
-		error = prepend_name(&end, &buflen, &dentry->d_name);
-		spin_unlock(&dentry->d_lock);
-		if (error != 0 || prepend(&end, &buflen, "/", 1) != 0)
-			goto Elong;
+		error = prepend_name(&end, &len, &dentry->d_name);
+		if (error)
+			break;
 
 		retval = end;
 		dentry = parent;
 	}
+	if (retry_cnt) {
+		retry_cnt--;
+		rcu_read_unlock();
+		if (read_seqretry(&rename_lock, seq))
+			goto restart;
+	} else
+		write_sequnlock(&rename_lock);
+	if (error)
+		goto Elong;
 	return retval;
 Elong:
 	return ERR_PTR(-ENAMETOOLONG);
@@ -2775,13 +2840,7 @@ Elong:
 
 char *dentry_path_raw(struct dentry *dentry, char *buf, int buflen)
 {
-	char *retval;
-
-	write_seqlock(&rename_lock);
-	retval = __dentry_path(dentry, buf, buflen);
-	write_sequnlock(&rename_lock);
-
-	return retval;
+	return __dentry_path(dentry, buf, buflen);
 }
 EXPORT_SYMBOL(dentry_path_raw);
 
@@ -2790,7 +2849,6 @@ char *dentry_path(struct dentry *dentry, char *buf, int buflen)
 	char *p = NULL;
 	char *retval;
 
-	write_seqlock(&rename_lock);
 	if (d_unlinked(dentry)) {
 		p = buf + buflen;
 		if (prepend(&p, &buflen, "//deleted", 10) != 0)
@@ -2798,7 +2856,6 @@ char *dentry_path(struct dentry *dentry, char *buf, int buflen)
 		buflen++;
 	}
 	retval = __dentry_path(dentry, buf, buflen);
-	write_sequnlock(&rename_lock);
 	if (!IS_ERR(retval) && p)
 		*p = '/';	/* restore '/' overriden with '\0' */
 	return retval;
@@ -2837,7 +2894,6 @@ SYSCALL_DEFINE2(getcwd, char __user *, buf, unsigned long, size)
 
 	error = -ENOENT;
 	br_read_lock(&vfsmount_lock);
-	write_seqlock(&rename_lock);
 	if (!d_unlinked(pwd.dentry)) {
 		unsigned long len;
 		char *cwd = page + PAGE_SIZE;
@@ -2845,7 +2901,6 @@ SYSCALL_DEFINE2(getcwd, char __user *, buf, unsigned long, size)
 
 		prepend(&cwd, &buflen, "\0", 1);
 		error = prepend_path(&pwd, &root, &cwd, &buflen);
-		write_sequnlock(&rename_lock);
 		br_read_unlock(&vfsmount_lock);
 
 		if (error < 0)
@@ -2866,7 +2921,6 @@ SYSCALL_DEFINE2(getcwd, char __user *, buf, unsigned long, size)
 				error = -EFAULT;
 		}
 	} else {
-		write_sequnlock(&rename_lock);
 		br_read_unlock(&vfsmount_lock);
 	}
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-06 16:08 ` [PATCH v3 1/1] " Waiman Long
@ 2013-09-06 20:52   ` Linus Torvalds
  2013-09-06 21:05     ` Al Viro
  2013-09-07  2:24     ` Waiman Long
  0 siblings, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2013-09-06 20:52 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, linux-fsdevel, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, Norton, Scott J, George Spelvin,
	John Stoffel

On Fri, Sep 6, 2013 at 9:08 AM, Waiman Long <Waiman.Long@hp.com> wrote:
>
> This patch will replace the writer's write_seqlock/write_sequnlock
> sequence of the rename_lock of the callers of the prepend_path() and
> __dentry_path() functions with the reader's read_seqbegin/read_seqretry
> sequence within these 2 functions.

Ok, this actually looks really good.

I do have one comment, from just reading the patch:

I would really like the stuff inside the

   restart:
      bptr = *buffer;
      blen = *buflen;
      if (retry_cnt) {
            seq = read_seqbegin(&rename_lock);
            rcu_read_lock();
      } else
            write_seqlock(&rename_lock);

      ... guts of path generation ...

      if (retry_cnt) {
            retry_cnt--;
            rcu_read_unlock();
            if (read_seqretry(&rename_lock, seq))
                  goto restart;
      } else
            write_sequnlock(&rename_lock);

could possible be done as a separate function?

Alternatively (or perhaps additionally), maybe the locking could be
done as an inline function too (taking the retry count as an argument)
to make things a bit more easy to understand.

Right now there is a lot of fairly subtle things going on in that
__dentry_path() function. It's not a huge function, but I think that
"while()" loop inside that locking could be done as its own function
and make it even more readable.

But I could already apply this as-is, so it's not a big deal.

Al - do you have comments? Do you want to take this through your tree,
or are you working on other things? I can take this directly too..

                 Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-06 20:52   ` Linus Torvalds
@ 2013-09-06 21:05     ` Al Viro
  2013-09-06 21:48       ` Linus Torvalds
  2013-09-07  2:24     ` Waiman Long
  1 sibling, 1 reply; 18+ messages in thread
From: Al Viro @ 2013-09-06 21:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Waiman Long, linux-fsdevel, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, Norton, Scott J, George Spelvin,
	John Stoffel

On Fri, Sep 06, 2013 at 01:52:49PM -0700, Linus Torvalds wrote:
 
> Al - do you have comments? Do you want to take this through your tree,
> or are you working on other things? I can take this directly too..

I can take that, but I'm really not convinced that we need writer lock
there at all.  After all, if we really can get livelocks on that one,
we would be getting them on d_lookup()...

Let's not complicate the things without need; if we ever run into loads
where livelock start to happen, we can look into dealing with them.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-06 21:05     ` Al Viro
@ 2013-09-06 21:48       ` Linus Torvalds
  2013-09-07  0:00         ` Al Viro
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2013-09-06 21:48 UTC (permalink / raw)
  To: Al Viro
  Cc: Waiman Long, linux-fsdevel, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, Norton, Scott J, George Spelvin,
	John Stoffel

On Fri, Sep 6, 2013 at 2:05 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> I can take that, but I'm really not convinced that we need writer lock
> there at all.  After all, if we really can get livelocks on that one,
> we would be getting them on d_lookup()...

d_lookup() does a _single_ path component. That's a *big* difference.

Sure, the hash chain that d_lookup() (well, __d_lookup()) ends up
walking is a bit more complicated than just following the dentry
parent pointer, but that's much harder to trigger than just creating a
really deep directory structure of single-letter nested directories,
and then doing a "getcwd()" that walks 1024+ parents, while another
thread is looping renaming things..

So I personally do feel a lot safer with the fallback to write locking here.

Especially since it's pretty simple, so there isn't really much downside.

             Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-06 21:48       ` Linus Torvalds
@ 2013-09-07  0:00         ` Al Viro
  2013-09-07  0:19           ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Al Viro @ 2013-09-07  0:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Waiman Long, linux-fsdevel, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, Norton, Scott J, George Spelvin,
	John Stoffel

On Fri, Sep 06, 2013 at 02:48:32PM -0700, Linus Torvalds wrote:
> On Fri, Sep 6, 2013 at 2:05 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > I can take that, but I'm really not convinced that we need writer lock
> > there at all.  After all, if we really can get livelocks on that one,
> > we would be getting them on d_lookup()...
> 
> d_lookup() does a _single_ path component. That's a *big* difference.
> 
> Sure, the hash chain that d_lookup() (well, __d_lookup()) ends up
> walking is a bit more complicated than just following the dentry
> parent pointer, but that's much harder to trigger than just creating a
> really deep directory structure of single-letter nested directories,
> and then doing a "getcwd()" that walks 1024+ parents, while another
> thread is looping renaming things..
> 
> So I personally do feel a lot safer with the fallback to write locking here.
> 
> Especially since it's pretty simple, so there isn't really much downside.

Er... what will happen if you have done just what you've described and have
a process call d_lookup()?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-07  0:00         ` Al Viro
@ 2013-09-07  0:19           ` Linus Torvalds
  2013-09-07  0:58             ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2013-09-07  0:19 UTC (permalink / raw)
  To: Al Viro
  Cc: Waiman Long, linux-fsdevel, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, Norton, Scott J, George Spelvin,
	John Stoffel

On Fri, Sep 6, 2013 at 5:00 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Er... what will happen if you have done just what you've described and have
> a process call d_lookup()?

Umm. Yes?

What part of "one single path component" did you not get?

To repeat: d_lookup() NEVER LOOKS UP A WHOLE PATHNAME. It looks up
just a single path component. It matters not one whit whether you look
up a filename that is 1500 components deep, d_lookup() only ever works
on one single component. It's a single short hash chain lookup.

Sure, it can fail, but people really have to work at it. You're not
going to be able to make it fail by just calling "rename()" in a loop.
You're going to have to do multiple threads at least, and now you need
to do it on multiple different filesystems, since otherwise those
multiple threads are going to be serialized by the (outer)
per-filesystem vfs-layer rename semaphores. In other words, it sounds
impossible to trigger, wouldn't you say? Or if you try, you're going
to stand out for using a *lot* of resources.

In contrast, doing the getcwd() lookup really is following potentially
quite long chains.

So it's quite possible that just a single thread doing rename() in a
loop (on, say, /tmp, so that there isn't any IO) can trigger the
sequence write-lock frequently enough that traversing 1500 d_parent
entries might never complete.

Have I tried it? No. But think about the two different scenarios.
There really is a *big* difference between looking up one single
dentry from its parent using our hash tables, and traversing a
potentially almost unbounded parenthood chain.

(We're bounded in practice by PATH_MAX, so you can't make getcwd()
traverse more than about 2000 parents (single character filename plus
the slash for each level), and for all I know filesystems might cap it
before that, so it's not unbounded, but the difference between "1" and
"2000" is pretty damn big)

             Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-07  0:19           ` Linus Torvalds
@ 2013-09-07  0:58             ` Linus Torvalds
  2013-09-07  3:01               ` Al Viro
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2013-09-07  0:58 UTC (permalink / raw)
  To: Al Viro
  Cc: Waiman Long, linux-fsdevel, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, Norton, Scott J, George Spelvin,
	John Stoffel

On Fri, Sep 6, 2013 at 5:19 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> (We're bounded in practice by PATH_MAX, so you can't make getcwd()
> traverse more than about 2000 parents (single character filename plus
> the slash for each level), and for all I know filesystems might cap it
> before that, so it's not unbounded, but the difference between "1" and
> "2000" is pretty damn big)

.. in particular, it's big enough that one is pretty much guaranteed
to fit in any reasonable L1 cache (if we have dentry hash chains so
long that that becomes a problem for traversing a single chain, we're
screwed anyway), while the other can most likely be a case of "not a
single L1 cache hit because by the time you fail and go back to the
start, you've flushed the L1 cache".

Now, whether 2000 L2 cache misses is long enough to give people a
chance to run the whole rename system call path in a loop a few times,
I don't know, but it sure as heck sounds likely.

Of course, you might still ask "why should we even care?" At least
without preemption, you might be able to trigger some really excessive
latencies and possibly a watchdog screaming at you as a result. But
that said, maybe we wouldn't care. I just think that the solution is
so simple (what, five extra lines or so) that it's worth avoiding even
the worry.

          Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-06 20:52   ` Linus Torvalds
  2013-09-06 21:05     ` Al Viro
@ 2013-09-07  2:24     ` Waiman Long
  1 sibling, 0 replies; 18+ messages in thread
From: Waiman Long @ 2013-09-07  2:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Viro, linux-fsdevel, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, Norton, Scott J, George Spelvin,
	John Stoffel

On 09/06/2013 04:52 PM, Linus Torvalds wrote:
> On Fri, Sep 6, 2013 at 9:08 AM, Waiman Long<Waiman.Long@hp.com>  wrote:
>> This patch will replace the writer's write_seqlock/write_sequnlock
>> sequence of the rename_lock of the callers of the prepend_path() and
>> __dentry_path() functions with the reader's read_seqbegin/read_seqretry
>> sequence within these 2 functions.
> Ok, this actually looks really good.
>
> I do have one comment, from just reading the patch:
>
> I would really like the stuff inside the
>
>     restart:
>        bptr = *buffer;
>        blen = *buflen;
>        if (retry_cnt) {
>              seq = read_seqbegin(&rename_lock);
>              rcu_read_lock();
>        } else
>              write_seqlock(&rename_lock);
>
>        ... guts of path generation ...
>
>        if (retry_cnt) {
>              retry_cnt--;
>              rcu_read_unlock();
>              if (read_seqretry(&rename_lock, seq))
>                    goto restart;
>        } else
>              write_sequnlock(&rename_lock);
>
> could possible be done as a separate function?
>
> Alternatively (or perhaps additionally), maybe the locking could be
> done as an inline function too (taking the retry count as an argument)
> to make things a bit more easy to understand.

I would prefer putting the begin and end blocks into 2 helper inlined 
helper functions to make code easier to look at. I will work on this 
over the weekend.

-Longman


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-07  0:58             ` Linus Torvalds
@ 2013-09-07  3:01               ` Al Viro
  2013-09-07 17:32                 ` Al Viro
  2013-09-07 17:52                 ` Linus Torvalds
  0 siblings, 2 replies; 18+ messages in thread
From: Al Viro @ 2013-09-07  3:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Waiman Long, linux-fsdevel, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, Norton, Scott J, George Spelvin,
	John Stoffel

On Fri, Sep 06, 2013 at 05:58:51PM -0700, Linus Torvalds wrote:
> On Fri, Sep 6, 2013 at 5:19 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > (We're bounded in practice by PATH_MAX, so you can't make getcwd()
> > traverse more than about 2000 parents (single character filename plus
> > the slash for each level), and for all I know filesystems might cap it
> > before that, so it's not unbounded, but the difference between "1" and
> > "2000" is pretty damn big)
> 
> .. in particular, it's big enough that one is pretty much guaranteed
> to fit in any reasonable L1 cache (if we have dentry hash chains so
> long that that becomes a problem for traversing a single chain, we're
> screwed anyway), while the other can most likely be a case of "not a
> single L1 cache hit because by the time you fail and go back to the
> start, you've flushed the L1 cache".
> 
> Now, whether 2000 L2 cache misses is long enough to give people a
> chance to run the whole rename system call path in a loop a few times,
> I don't know, but it sure as heck sounds likely.
> 
> Of course, you might still ask "why should we even care?" At least
> without preemption, you might be able to trigger some really excessive
> latencies and possibly a watchdog screaming at you as a result. But
> that said, maybe we wouldn't care. I just think that the solution is
> so simple (what, five extra lines or so) that it's worth avoiding even
> the worry.

We already have that kind of logics - see select_parent() et.al. in
mainline or d_walk() in vfs.git#for-linus (pull request will go in
a few minutes).  With this patch we get

	* plain seqretry loop (d_lookup(), is_subdir(), autofs4_getpath(),
ceph_misc_build_path(), [cifs] build_path_from_dentry(), nfs_path(),
[audit] handle_path())
	* try seqretry once, then switch to write_seqlock() (the things
that got unified into d_walk())
	* try seqretry three times, then switch to write_seqlock() (d_path()
and friends)
	* several pure write_seqlock() users (d_move(), d_set_mounted(),
d_materialize_unique())

The last class is not a problem - these we want as writers.  I really don't
like the way the rest is distributed - if nothing else, nfs_path() and
friends are in exactly the same situation as d_path().  Moreover, why
the distinction between "try once" and "try thrice"?

_If_ we fold the second and the third groups together (and probably have
a bunch from the first one join that), we at least get something
understandable, but the I really wonder if seqlock has the right calling
conventions for that (and at least I'd like to fold the "already got writelock"
flag into seq - we do have a spare bit there).

Comments?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-07  3:01               ` Al Viro
@ 2013-09-07 17:32                 ` Al Viro
  2013-09-08  4:15                   ` Ian Kent
  2013-09-07 17:52                 ` Linus Torvalds
  1 sibling, 1 reply; 18+ messages in thread
From: Al Viro @ 2013-09-07 17:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Waiman Long, linux-fsdevel, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, Norton, Scott J, George Spelvin,
	John Stoffel, Sage Weil, Ian Kent

On Sat, Sep 07, 2013 at 04:01:10AM +0100, Al Viro wrote:
> 	* plain seqretry loop (d_lookup(), is_subdir(), autofs4_getpath(),
> ceph_misc_build_path(), [cifs] build_path_from_dentry(), nfs_path(),
      _mds_, actually - sorry.
> [audit] handle_path())
> 	* try seqretry once, then switch to write_seqlock() (the things
> that got unified into d_walk())
> 	* try seqretry three times, then switch to write_seqlock() (d_path()
> and friends)
> 	* several pure write_seqlock() users (d_move(), d_set_mounted(),
> d_materialize_unique())

BTW, autofs4_getpath() looks really odd:
static int autofs4_getpath(struct autofs_sb_info *sbi,
                           struct dentry *dentry, char **name)
and *name is never modified in it.  Why not simply pass it by value?
Moreover, I'm not sure I understand what do we need sbi->fs_lock in
there.  Other than that, it's very close to dentry_path() (well, that
and different calling conventions).  Ian?

ceph_mds_build_path() is similar, but it does kmalloc to store the result
and grabs ->d_lock for ->d_name protection.  This one, BTW, is much more
likely to get stalls - it ends up doing kmalloc on each attempt (after
having calculated the current length).  Bugger if I understand what's wrong
with simply grabbing a page and doing that once - before everything else...

build_path_from_dentry() - same story, might very well have been the source
of ceph_mds_build_path().

nfs_path() - not far from open-coded dentry_path() with some postprocessing,
uses ->d_lock for ->d_name protection.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-07  3:01               ` Al Viro
  2013-09-07 17:32                 ` Al Viro
@ 2013-09-07 17:52                 ` Linus Torvalds
  2013-09-07 18:07                   ` Al Viro
  1 sibling, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2013-09-07 17:52 UTC (permalink / raw)
  To: Al Viro
  Cc: Waiman Long, linux-fsdevel, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, Norton, Scott J, George Spelvin,
	John Stoffel

On Fri, Sep 6, 2013 at 8:01 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>         * plain seqretry loop (d_lookup(), is_subdir(), autofs4_getpath(),
> ceph_misc_build_path(), [cifs] build_path_from_dentry(), nfs_path(),
> [audit] handle_path())
>         * try seqretry once, then switch to write_seqlock() (the things
> that got unified into d_walk())
>         * try seqretry three times, then switch to write_seqlock() (d_path()
> and friends)
>         * several pure write_seqlock() users (d_move(), d_set_mounted(),
> d_materialize_unique())
>
> The last class is not a problem - these we want as writers.  I really don't
> like the way the rest is distributed - if nothing else, nfs_path() and
> friends are in exactly the same situation as d_path().  Moreover, why
> the distinction between "try once" and "try thrice"?
>
> _If_ we fold the second and the third groups together (and probably have
> a bunch from the first one join that), we at least get something
> understandable, but the I really wonder if seqlock has the right calling
> conventions for that (and at least I'd like to fold the "already got writelock"
> flag into seq - we do have a spare bit there).
>
> Comments?

So I think we could make a more complicated data structure that looks
something like this:

   struct seqlock_retry {
      unsigned int seq_no;
      int state;
   };

and pass that around. Gcc should do pretty well, especially if we
inline things (but even if not, small structures that fit in 64 bytes
generate reasonable code even on 32-bit targets, because gcc knows
about using two registers for passing data around)..

Then you can make "state" have a retry counter in it, and have a
negative value mean "I hold the lock for writing". Add a couple of
helper functions, and you can fairly easily handle the mixed "try for
reading first, then fall back to writing".

That said, __d_lookup() still shows up as very performance-critical on
some loads (symlinks in particular cause us to fall out of the RCU
cases) so I'd like to keep that using the simple pure read case. I
don't believe you can livelock it, as mentioned. But the other ones
might well be worth moving to a "fall back to write-locking after <n>
tries" model. They might all traverse user-specified paths of fairly
arbitrary depth, no?

So this "seqlock_retry" thing wouldn't _replace_ bare seqlocks, it
would just be a helper thing for this kind of behavior where we want
to normally do things with just the read-lock, but want to guarantee
that we don't live-lock.

Sounds reasonable?

                 Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-07 17:52                 ` Linus Torvalds
@ 2013-09-07 18:07                   ` Al Viro
  2013-09-07 18:53                     ` Al Viro
  2013-09-09 14:31                     ` Waiman Long
  0 siblings, 2 replies; 18+ messages in thread
From: Al Viro @ 2013-09-07 18:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Waiman Long, linux-fsdevel, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, Norton, Scott J, George Spelvin,
	John Stoffel

On Sat, Sep 07, 2013 at 10:52:02AM -0700, Linus Torvalds wrote:

> So I think we could make a more complicated data structure that looks
> something like this:
> 
>    struct seqlock_retry {
>       unsigned int seq_no;
>       int state;
>    };
> 
> and pass that around. Gcc should do pretty well, especially if we
> inline things (but even if not, small structures that fit in 64 bytes
> generate reasonable code even on 32-bit targets, because gcc knows
> about using two registers for passing data around)..
> 
> Then you can make "state" have a retry counter in it, and have a
> negative value mean "I hold the lock for writing". Add a couple of
> helper functions, and you can fairly easily handle the mixed "try for
> reading first, then fall back to writing".
> 
> That said, __d_lookup() still shows up as very performance-critical on
> some loads (symlinks in particular cause us to fall out of the RCU
> cases) so I'd like to keep that using the simple pure read case. I
> don't believe you can livelock it, as mentioned. But the other ones
> might well be worth moving to a "fall back to write-locking after <n>
> tries" model. They might all traverse user-specified paths of fairly
> arbitrary depth, no?
> 
> So this "seqlock_retry" thing wouldn't _replace_ bare seqlocks, it
> would just be a helper thing for this kind of behavior where we want
> to normally do things with just the read-lock, but want to guarantee
> that we don't live-lock.
> 
> Sounds reasonable?

More or less; I just wonder if we are overdesigning here - if we don't
do "repeat more than once", we can simply use the lower bit of seq -
read_seqlock() always returns an even value.  So we could do something
like seqretry_and_lock(lock, &seq):
	if ((*seq & 1) || !read_seqretry(lock, *seq))
		return true;
	*seq |= 1;
	write_seqlock(lock);
	return false;
and seqretry_done(lock, seq):
	if (seq & 1)
		write_sequnlock(lock);
with these loops turning into
	seq = read_seqlock(&rename_lock);
	...
	if (!seqretry_and_lock(&rename_lock, &seq))
		goto again;
	...
	seqretry_done(&rename_lock);

But I'd really like to understand the existing zoo - in particular, ceph and
cifs users can't be converted to anything of that kind (blocking kmalloc()
can't live under write_seqlock()) and they are _easier_ to livelock than
d_path(), due to the same kmalloc() widening the window.  Guys, do we really
care about precisely-sized allocations there?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-07 18:07                   ` Al Viro
@ 2013-09-07 18:53                     ` Al Viro
  2013-09-09 14:31                     ` Waiman Long
  1 sibling, 0 replies; 18+ messages in thread
From: Al Viro @ 2013-09-07 18:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Waiman Long, linux-fsdevel, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, Norton, Scott J, George Spelvin,
	John Stoffel

On Sat, Sep 07, 2013 at 07:07:24PM +0100, Al Viro wrote:

> with these loops turning into
> 	seq = read_seqlock(&rename_lock);
again:
> 	...
> 	if (!seqretry_and_lock(&rename_lock, &seq))
> 		goto again;
> 	...
> 	seqretry_done(&rename_lock);

Forgot the label, sorry.

And I agree that d_lookup() ought to stay as is - this is just about
the ones that try readlock once and then fall back to writer.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-07 17:32                 ` Al Viro
@ 2013-09-08  4:15                   ` Ian Kent
  2013-09-08  4:58                     ` Al Viro
  0 siblings, 1 reply; 18+ messages in thread
From: Ian Kent @ 2013-09-08  4:15 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Waiman Long, linux-fsdevel,
	Linux Kernel Mailing List, Chandramouleeswaran, Aswin, Norton,
	Scott J, George Spelvin, John Stoffel, Sage Weil

On Sat, 2013-09-07 at 18:32 +0100, Al Viro wrote:
> On Sat, Sep 07, 2013 at 04:01:10AM +0100, Al Viro wrote:
> > 	* plain seqretry loop (d_lookup(), is_subdir(), autofs4_getpath(),
> > ceph_misc_build_path(), [cifs] build_path_from_dentry(), nfs_path(),
>       _mds_, actually - sorry.
> > [audit] handle_path())
> > 	* try seqretry once, then switch to write_seqlock() (the things
> > that got unified into d_walk())
> > 	* try seqretry three times, then switch to write_seqlock() (d_path()
> > and friends)
> > 	* several pure write_seqlock() users (d_move(), d_set_mounted(),
> > d_materialize_unique())
> 
> BTW, autofs4_getpath() looks really odd:
> static int autofs4_getpath(struct autofs_sb_info *sbi,
>                            struct dentry *dentry, char **name)
> and *name is never modified in it.  Why not simply pass it by value?
> Moreover, I'm not sure I understand what do we need sbi->fs_lock in
> there.  Other than that, it's very close to dentry_path() (well, that
> and different calling conventions).  Ian?

Yes, it is close to dentry_path() but the path functions used to return
the full path to the root, although I don't see where dentry_path() get
the root, mmm ... 

autofs4_getpath() is supposed to return a path relative to the current
dentry.

That goes back to the pseudo direct mounts of autofs version 4 where the
keys could be a relative path.

The fs_lock probably isn't needed. This was a tree of directories and I
didn't want mount requests coming in while I was getting the path, and
didn't want an expire to happen either, although there should only be
one expire process anyway.

Ian



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-08  4:15                   ` Ian Kent
@ 2013-09-08  4:58                     ` Al Viro
  2013-09-08  8:51                       ` Ian Kent
  0 siblings, 1 reply; 18+ messages in thread
From: Al Viro @ 2013-09-08  4:58 UTC (permalink / raw)
  To: Ian Kent
  Cc: Linus Torvalds, Waiman Long, linux-fsdevel,
	Linux Kernel Mailing List, Chandramouleeswaran, Aswin, Norton,
	Scott J, George Spelvin, John Stoffel, Sage Weil

On Sun, Sep 08, 2013 at 12:15:46PM +0800, Ian Kent wrote:
> > and *name is never modified in it.  Why not simply pass it by value?
> > Moreover, I'm not sure I understand what do we need sbi->fs_lock in
> > there.  Other than that, it's very close to dentry_path() (well, that
> > and different calling conventions).  Ian?
> 
> Yes, it is close to dentry_path() but the path functions used to return
> the full path to the root, although I don't see where dentry_path() get
> the root, mmm ... 

dentry_path(), unlike d_path(), is relative to the root of filesystem
containing dentry in question.  There are 3 of those suckers:
	d_path(): vfsmount/dentry => path to current process' root
	d_absolute_path(): ditto, but ignores chroot jails (goes to
the absolute root of namespace)
	dentry_path(): dentry => path from root of fs containing that
dentry.

IOW, if you have something mounted on /jail/mnt/foo and are chrooted into
/jail, passing vfsmount/dentry of /jail/mnt/foo/bar/baz to d_path() will
yield "/mnt/foo/bar/baz", to d_absolute_path() - "/jail/mnt/foo/bar/baz" and
passing the same dentry to dentry_path() - "/bar/baz".

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-08  4:58                     ` Al Viro
@ 2013-09-08  8:51                       ` Ian Kent
  0 siblings, 0 replies; 18+ messages in thread
From: Ian Kent @ 2013-09-08  8:51 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Waiman Long, linux-fsdevel,
	Linux Kernel Mailing List, Chandramouleeswaran, Aswin, Norton,
	Scott J, George Spelvin, John Stoffel, Sage Weil

On Sun, 2013-09-08 at 05:58 +0100, Al Viro wrote:
> On Sun, Sep 08, 2013 at 12:15:46PM +0800, Ian Kent wrote:
> > > and *name is never modified in it.  Why not simply pass it by value?
> > > Moreover, I'm not sure I understand what do we need sbi->fs_lock in
> > > there.  Other than that, it's very close to dentry_path() (well, that
> > > and different calling conventions).  Ian?
> > 
> > Yes, it is close to dentry_path() but the path functions used to return
> > the full path to the root, although I don't see where dentry_path() get
> > the root, mmm ... 
> 
> dentry_path(), unlike d_path(), is relative to the root of filesystem
> containing dentry in question.  There are 3 of those suckers:
> 	d_path(): vfsmount/dentry => path to current process' root
> 	d_absolute_path(): ditto, but ignores chroot jails (goes to
> the absolute root of namespace)
> 	dentry_path(): dentry => path from root of fs containing that
> dentry.
> 
> IOW, if you have something mounted on /jail/mnt/foo and are chrooted into
> /jail, passing vfsmount/dentry of /jail/mnt/foo/bar/baz to d_path() will
> yield "/mnt/foo/bar/baz", to d_absolute_path() - "/jail/mnt/foo/bar/baz" and
> passing the same dentry to dentry_path() - "/bar/baz".

Right, so all I need to do is drop the leading "/" and I'll have what I
need. I'll do that.

Thanks
Ian



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
  2013-09-07 18:07                   ` Al Viro
  2013-09-07 18:53                     ` Al Viro
@ 2013-09-09 14:31                     ` Waiman Long
  1 sibling, 0 replies; 18+ messages in thread
From: Waiman Long @ 2013-09-09 14:31 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, linux-fsdevel, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, Norton, Scott J, George Spelvin,
	John Stoffel

On 09/07/2013 02:07 PM, Al Viro wrote:
> On Sat, Sep 07, 2013 at 10:52:02AM -0700, Linus Torvalds wrote:
>
>> So I think we could make a more complicated data structure that looks
>> something like this:
>>
>>     struct seqlock_retry {
>>        unsigned int seq_no;
>>        int state;
>>     };
>>
>> and pass that around. Gcc should do pretty well, especially if we
>> inline things (but even if not, small structures that fit in 64 bytes
>> generate reasonable code even on 32-bit targets, because gcc knows
>> about using two registers for passing data around)..
>>
>> Then you can make "state" have a retry counter in it, and have a
>> negative value mean "I hold the lock for writing". Add a couple of
>> helper functions, and you can fairly easily handle the mixed "try for
>> reading first, then fall back to writing".
>>
>> That said, __d_lookup() still shows up as very performance-critical on
>> some loads (symlinks in particular cause us to fall out of the RCU
>> cases) so I'd like to keep that using the simple pure read case. I
>> don't believe you can livelock it, as mentioned. But the other ones
>> might well be worth moving to a "fall back to write-locking after<n>
>> tries" model. They might all traverse user-specified paths of fairly
>> arbitrary depth, no?
>>
>> So this "seqlock_retry" thing wouldn't _replace_ bare seqlocks, it
>> would just be a helper thing for this kind of behavior where we want
>> to normally do things with just the read-lock, but want to guarantee
>> that we don't live-lock.
>>
>> Sounds reasonable?
> More or less; I just wonder if we are overdesigning here - if we don't
> do "repeat more than once", we can simply use the lower bit of seq -
> read_seqlock() always returns an even value.  So we could do something
> like seqretry_and_lock(lock,&seq):
> 	if ((*seq&  1) || !read_seqretry(lock, *seq))
> 		return true;
> 	*seq |= 1;
> 	write_seqlock(lock);
> 	return false;
> and seqretry_done(lock, seq):
> 	if (seq&  1)
> 		write_sequnlock(lock);
> with these loops turning into
> 	seq = read_seqlock(&rename_lock);
> 	...
> 	if (!seqretry_and_lock(&rename_lock,&seq))
> 		goto again;
> 	...
> 	seqretry_done(&rename_lock);

I am fine with try it once and then lock instead of trying it a few 
times. Now are you planning to have these helper functions for the 
dcache layer only or as part of the seqlock infrastructure? If we are 
going to touch the seqlock layer, I would suggest adding a blocking 
reader that takes the lock, but won't update the sequence number so that 
it won't block other sequence readers as my original seqlock patch was 
doing.

-Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-09-09 14:31 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-06 16:08 [PATCH v3 0/1] dcache: Translating dentry into pathname without taking rename_lock Waiman Long
2013-09-06 16:08 ` [PATCH v3 1/1] " Waiman Long
2013-09-06 20:52   ` Linus Torvalds
2013-09-06 21:05     ` Al Viro
2013-09-06 21:48       ` Linus Torvalds
2013-09-07  0:00         ` Al Viro
2013-09-07  0:19           ` Linus Torvalds
2013-09-07  0:58             ` Linus Torvalds
2013-09-07  3:01               ` Al Viro
2013-09-07 17:32                 ` Al Viro
2013-09-08  4:15                   ` Ian Kent
2013-09-08  4:58                     ` Al Viro
2013-09-08  8:51                       ` Ian Kent
2013-09-07 17:52                 ` Linus Torvalds
2013-09-07 18:07                   ` Al Viro
2013-09-07 18:53                     ` Al Viro
2013-09-09 14:31                     ` Waiman Long
2013-09-07  2:24     ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.