netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
       [not found]                       ` <1311660013.2996.6.camel@edumazet-laptop>
@ 2011-07-26  8:21                         ` Eric Dumazet
  2011-07-26  9:03                           ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2011-07-26  8:21 UTC (permalink / raw)
  To: Tim Chen, Al Viro, David Miller
  Cc: Christoph Hellwig, Andi Kleen, Matthew Wilcox, Anton Blanchard,
	npiggin, linux-kernel, linux-fsdevel, netdev

Le mardi 26 juillet 2011 à 08:00 +0200, Eric Dumazet a écrit :

> Next step is to not chain pipes/sockets into superblock s_inodes list
> 
> inode_sb_list_add()/inode_sb_list_del() is the very last contention
> point because of spin_lock(&inode_sb_list_lock);

Well, not 'last' contention point, as we still hit remove_inode_hash(),
inode_wb_list_del(), inode_lru_list_del(), but thats a clear win on my
2x4x2 machine : 9 seconds instead of 22 on a close(socket()) benchmark.


[PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list

Workloads using pipes and sockets hit inode_sb_list_lock contention.

superblock s_inodes list is needed for quota, dirty, pagecache and
fsnotify management. pipe/anon/socket fs are clearly not candidates for
these.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 fs/anon_inodes.c   |    2 +-
 fs/inode.c         |   31 +++++++++++++++++++++----------
 fs/pipe.c          |    2 +-
 include/linux/fs.h |    3 ++-
 net/socket.c       |    2 +-
 5 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 4d433d3..269499e 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -187,7 +187,7 @@ EXPORT_SYMBOL_GPL(anon_inode_getfd);
  */
 static struct inode *anon_inode_mkinode(void)
 {
-	struct inode *inode = new_inode(anon_inode_mnt->mnt_sb);
+	struct inode *inode = __new_inode(anon_inode_mnt->mnt_sb);
 
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
diff --git a/fs/inode.c b/fs/inode.c
index 96c77b8..8a6d62b 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -362,9 +362,11 @@ EXPORT_SYMBOL_GPL(inode_sb_list_add);
 
 static inline void inode_sb_list_del(struct inode *inode)
 {
-	spin_lock(&inode_sb_list_lock);
-	list_del_init(&inode->i_sb_list);
-	spin_unlock(&inode_sb_list_lock);
+	if (!list_empty(&inode->i_sb_list)) {
+		spin_lock(&inode_sb_list_lock);
+		list_del_init(&inode->i_sb_list);
+		spin_unlock(&inode_sb_list_lock);
+	}
 }
 
 static unsigned long hash(struct super_block *sb, unsigned long hashval)
@@ -796,6 +798,19 @@ unsigned int get_next_ino(void)
 }
 EXPORT_SYMBOL(get_next_ino);
 
+struct inode *__new_inode(struct super_block *sb)
+{
+	struct inode *inode = alloc_inode(sb);
+
+	if (inode) {
+		spin_lock(&inode->i_lock);
+		inode->i_state = 0;
+		spin_unlock(&inode->i_lock);
+		INIT_LIST_HEAD(&inode->i_sb_list);
+	}
+	return inode;
+}
+
 /**
  *	new_inode 	- obtain an inode
  *	@sb: superblock
@@ -814,13 +829,9 @@ struct inode *new_inode(struct super_block *sb)
 
 	spin_lock_prefetch(&inode_sb_list_lock);
 
-	inode = alloc_inode(sb);
-	if (inode) {
-		spin_lock(&inode->i_lock);
-		inode->i_state = 0;
-		spin_unlock(&inode->i_lock);
-		inode_sb_list_add(inode);
-	}
+	inode = __new_inode(sb);
+	if (inode)
+			inode_sb_list_add(inode);
 	return inode;
 }
 EXPORT_SYMBOL(new_inode);
diff --git a/fs/pipe.c b/fs/pipe.c
index 1b7f9af..937b962 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -948,7 +948,7 @@ static const struct dentry_operations pipefs_dentry_operations = {
 
 static struct inode * get_pipe_inode(void)
 {
-	struct inode *inode = new_inode(pipe_mnt->mnt_sb);
+	struct inode *inode = __new_inode(pipe_mnt->mnt_sb);
 	struct pipe_inode_info *pipe;
 
 	if (!inode)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a665804..60be54f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2310,7 +2310,8 @@ extern void __iget(struct inode * inode);
 extern void iget_failed(struct inode *);
 extern void end_writeback(struct inode *);
 extern void __destroy_inode(struct inode *);
-extern struct inode *new_inode(struct super_block *);
+extern struct inode *__new_inode(struct super_block *sb);
+extern struct inode *new_inode(struct super_block *sb);
 extern void free_inode_nonrcu(struct inode *inode);
 extern int should_remove_suid(struct dentry *);
 extern int file_remove_suid(struct file *);
diff --git a/net/socket.c b/net/socket.c
index 02dc82d..b4b8a08 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -467,7 +467,7 @@ static struct socket *sock_alloc(void)
 	struct inode *inode;
 	struct socket *sock;
 
-	inode = new_inode(sock_mnt->mnt_sb);
+	inode = __new_inode(sock_mnt->mnt_sb);
 	if (!inode)
 		return NULL;
 


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
  2011-07-26  8:21                         ` [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list Eric Dumazet
@ 2011-07-26  9:03                           ` Christoph Hellwig
  2011-07-26  9:36                             ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2011-07-26  9:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tim Chen, Al Viro, David Miller, Christoph Hellwig, Andi Kleen,
	Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
	linux-fsdevel, netdev

On Tue, Jul 26, 2011 at 10:21:06AM +0200, Eric Dumazet wrote:
> Well, not 'last' contention point, as we still hit remove_inode_hash(),

There should be no ned to put pipe or anon inodes on the inode hash.
Probably sockets don't need it either, but I'd need to look at it in
detail.

> inode_wb_list_del()

The should never be on the wb list either, doing an unlocked check for
actually beeing on the list before taking the lock should help you.

> inode_lru_list_del(),

No real need to keep inodes in the LRU if we only allocate them using
new_inode but never look them up either.  You might want to try setting
.drop_inode to generic_delete_inode for these.

> +struct inode *__new_inode(struct super_block *sb)
> +{
> +	struct inode *inode = alloc_inode(sb);
> +
> +	if (inode) {
> +		spin_lock(&inode->i_lock);
> +		inode->i_state = 0;
> +		spin_unlock(&inode->i_lock);
> +		INIT_LIST_HEAD(&inode->i_sb_list);
> +	}
> +	return inode;
> +}

This needs a much better name like new_inode_pseudo, and a kerneldoc 
comment explaining when it is safe to use, and the consequences, which
appear to me:

 - fs may never be unmount
 - quotas can't work on the filesystem
 - writeback can't work on the filesystem

> @@ -814,13 +829,9 @@ struct inode *new_inode(struct super_block *sb)
>  
>  	spin_lock_prefetch(&inode_sb_list_lock);
>  
> -	inode = alloc_inode(sb);
> -	if (inode) {
> -		spin_lock(&inode->i_lock);
> -		inode->i_state = 0;
> -		spin_unlock(&inode->i_lock);
> -		inode_sb_list_add(inode);
> -	}
> +	inode = __new_inode(sb);
> +	if (inode)
> +			inode_sb_list_add(inode);

bad indentation.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
  2011-07-26  9:03                           ` Christoph Hellwig
@ 2011-07-26  9:36                             ` Eric Dumazet
  2011-07-26  9:42                               ` Christoph Hellwig
  2011-07-27 15:21                               ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
  0 siblings, 2 replies; 14+ messages in thread
From: Eric Dumazet @ 2011-07-26  9:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox,
	Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev

Le mardi 26 juillet 2011 à 05:03 -0400, Christoph Hellwig a écrit :
> On Tue, Jul 26, 2011 at 10:21:06AM +0200, Eric Dumazet wrote:
> > Well, not 'last' contention point, as we still hit remove_inode_hash(),
> 
> There should be no ned to put pipe or anon inodes on the inode hash.
> Probably sockets don't need it either, but I'd need to look at it in
> detail.
> 
> > inode_wb_list_del()
> 
> The should never be on the wb list either, doing an unlocked check for
> actually beeing on the list before taking the lock should help you.

Yes, it might even help regular inodes ;)

> 
> > inode_lru_list_del(),
> 
> No real need to keep inodes in the LRU if we only allocate them using
> new_inode but never look them up either.  You might want to try setting
> .drop_inode to generic_delete_inode for these.

Yes, I'll take a look, thanks.

> 
> > +struct inode *__new_inode(struct super_block *sb)
> > +{
> > +	struct inode *inode = alloc_inode(sb);
> > +
> > +	if (inode) {
> > +		spin_lock(&inode->i_lock);
> > +		inode->i_state = 0;
> > +		spin_unlock(&inode->i_lock);
> > +		INIT_LIST_HEAD(&inode->i_sb_list);
> > +	}
> > +	return inode;
> > +}
> 
> This needs a much better name like new_inode_pseudo, and a kerneldoc 
> comment explaining when it is safe to use, and the consequences, which
> appear to me:
> 
>  - fs may never be unmount
>  - quotas can't work on the filesystem
>  - writeback can't work on the filesystem

Thanks for reviewing, here is v2 of the patch, addressing your comments.


[PATCH v2] vfs: dont chain pipe/anon/socket on superblock s_inodes list

Workloads using pipes and sockets hit inode_sb_list_lock contention.

superblock s_inodes list is needed for quota, dirty, pagecache and
fsnotify management. pipe/anon/socket fs are clearly not candidates for
these.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
v2: address Christoph comments

 fs/anon_inodes.c   |    2 +-
 fs/inode.c         |   39 ++++++++++++++++++++++++++++++---------
 fs/pipe.c          |    2 +-
 include/linux/fs.h |    3 ++-
 net/socket.c       |    2 +-
 5 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 4d433d3..f11e43e 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -187,7 +187,7 @@ EXPORT_SYMBOL_GPL(anon_inode_getfd);
  */
 static struct inode *anon_inode_mkinode(void)
 {
-	struct inode *inode = new_inode(anon_inode_mnt->mnt_sb);
+	struct inode *inode = new_inode_pseudo(anon_inode_mnt->mnt_sb);
 
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
diff --git a/fs/inode.c b/fs/inode.c
index 96c77b8..319b93b 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -362,9 +362,11 @@ EXPORT_SYMBOL_GPL(inode_sb_list_add);
 
 static inline void inode_sb_list_del(struct inode *inode)
 {
-	spin_lock(&inode_sb_list_lock);
-	list_del_init(&inode->i_sb_list);
-	spin_unlock(&inode_sb_list_lock);
+	if (!list_empty(&inode->i_sb_list)) {
+		spin_lock(&inode_sb_list_lock);
+		list_del_init(&inode->i_sb_list);
+		spin_unlock(&inode_sb_list_lock);
+	}
 }
 
 static unsigned long hash(struct super_block *sb, unsigned long hashval)
@@ -797,6 +799,29 @@ unsigned int get_next_ino(void)
 EXPORT_SYMBOL(get_next_ino);
 
 /**
+ *	new_inode_pseudo 	- obtain an inode
+ *	@sb: superblock
+ *
+ *	Allocates a new inode for given superblock.
+ *	Inode wont be chained in superblock s_inodes list
+ *	This means :
+ *	- fs can't be unmount
+ *	- quotas, fsnotify, writeback can't work
+ */
+struct inode *new_inode_pseudo(struct super_block *sb)
+{
+	struct inode *inode = alloc_inode(sb);
+
+	if (inode) {
+		spin_lock(&inode->i_lock);
+		inode->i_state = 0;
+		spin_unlock(&inode->i_lock);
+		INIT_LIST_HEAD(&inode->i_sb_list);
+	}
+	return inode;
+}
+
+/**
  *	new_inode 	- obtain an inode
  *	@sb: superblock
  *
@@ -814,13 +839,9 @@ struct inode *new_inode(struct super_block *sb)
 
 	spin_lock_prefetch(&inode_sb_list_lock);
 
-	inode = alloc_inode(sb);
-	if (inode) {
-		spin_lock(&inode->i_lock);
-		inode->i_state = 0;
-		spin_unlock(&inode->i_lock);
+	inode = new_inode_pseudo(sb);
+	if (inode)
 		inode_sb_list_add(inode);
-	}
 	return inode;
 }
 EXPORT_SYMBOL(new_inode);
diff --git a/fs/pipe.c b/fs/pipe.c
index 1b7f9af..0e0be1d 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -948,7 +948,7 @@ static const struct dentry_operations pipefs_dentry_operations = {
 
 static struct inode * get_pipe_inode(void)
 {
-	struct inode *inode = new_inode(pipe_mnt->mnt_sb);
+	struct inode *inode = new_inode_pseudo(pipe_mnt->mnt_sb);
 	struct pipe_inode_info *pipe;
 
 	if (!inode)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a665804..cc363fa 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2310,7 +2310,8 @@ extern void __iget(struct inode * inode);
 extern void iget_failed(struct inode *);
 extern void end_writeback(struct inode *);
 extern void __destroy_inode(struct inode *);
-extern struct inode *new_inode(struct super_block *);
+extern struct inode *new_inode_pseudo(struct super_block *sb);
+extern struct inode *new_inode(struct super_block *sb);
 extern void free_inode_nonrcu(struct inode *inode);
 extern int should_remove_suid(struct dentry *);
 extern int file_remove_suid(struct file *);
diff --git a/net/socket.c b/net/socket.c
index 02dc82d..26ed35c 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -467,7 +467,7 @@ static struct socket *sock_alloc(void)
 	struct inode *inode;
 	struct socket *sock;
 
-	inode = new_inode(sock_mnt->mnt_sb);
+	inode = new_inode_pseudo(sock_mnt->mnt_sb);
 	if (!inode)
 		return NULL;
 



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
  2011-07-26  9:36                             ` Eric Dumazet
@ 2011-07-26  9:42                               ` Christoph Hellwig
  2011-07-26 10:43                                 ` Eric Dumazet
  2011-07-27 15:21                               ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
  1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2011-07-26  9:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen,
	Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
	linux-fsdevel, netdev

On Tue, Jul 26, 2011 at 11:36:34AM +0200, Eric Dumazet wrote:
> [PATCH v2] vfs: dont chain pipe/anon/socket on superblock s_inodes list
> 
> Workloads using pipes and sockets hit inode_sb_list_lock contention.
> 
> superblock s_inodes list is needed for quota, dirty, pagecache and
> fsnotify management. pipe/anon/socket fs are clearly not candidates for
> these.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Looks good to me,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
  2011-07-26  9:42                               ` Christoph Hellwig
@ 2011-07-26 10:43                                 ` Eric Dumazet
  2011-07-26 11:49                                   ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2011-07-26 10:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox,
	Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev

Le mardi 26 juillet 2011 à 05:42 -0400, Christoph Hellwig a écrit :
> On Tue, Jul 26, 2011 at 11:36:34AM +0200, Eric Dumazet wrote:
> > [PATCH v2] vfs: dont chain pipe/anon/socket on superblock s_inodes list
> > 
> > Workloads using pipes and sockets hit inode_sb_list_lock contention.
> > 
> > superblock s_inodes list is needed for quota, dirty, pagecache and
> > fsnotify management. pipe/anon/socket fs are clearly not candidates for
> > these.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Looks good to me,
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 

Thanks !

BTW, we have one atomic op that could be avoided in new_inode()

spin_lock(&inode->i_lock);
inode->i_state = 0;
spin_unlock(&inode->i_lock);

can probably be changed to something less expensive...

inode->i_state = 0;
smp_wmb();

Not clear if we really need a memory barrier either....



--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
  2011-07-26 10:43                                 ` Eric Dumazet
@ 2011-07-26 11:49                                   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-07-26 11:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen,
	Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
	linux-fsdevel, netdev

On Tue, Jul 26, 2011 at 12:43:33PM +0200, Eric Dumazet wrote:
> BTW, we have one atomic op that could be avoided in new_inode()
> 
> spin_lock(&inode->i_lock);
> inode->i_state = 0;
> spin_unlock(&inode->i_lock);
> 
> can probably be changed to something less expensive...
> 
> inode->i_state = 0;
> smp_wmb();
> 
> Not clear if we really need a memory barrier either....

I think we already had this in some of the earlier vfs/inode scale
series, but it got lost when Al asked to just put the fundamental
changes in.

For plain new_inode() the barrier shouldn't be needed as we take
the sb list lock just a little later.  I'm not sure about your new
variant, so I'll rather lave that to you.

There's a few other things missing from earlier iterations, most notable
the non-atomic i_count, and the bucket locks for the inode hash, if
you're eager enough to look into that area.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] vfs: avoid taking locks if inode not in lists
  2011-07-26  9:36                             ` Eric Dumazet
  2011-07-26  9:42                               ` Christoph Hellwig
@ 2011-07-27 15:21                               ` Eric Dumazet
  2011-07-27 17:12                                 ` Andi Kleen
  2011-07-27 20:44                                 ` Christoph Hellwig
  1 sibling, 2 replies; 14+ messages in thread
From: Eric Dumazet @ 2011-07-27 15:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox,
	Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev

Le mardi 26 juillet 2011 à 11:36 +0200, Eric Dumazet a écrit :
> Le mardi 26 juillet 2011 à 05:03 -0400, Christoph Hellwig a écrit :
> > On Tue, Jul 26, 2011 at 10:21:06AM +0200, Eric Dumazet wrote:
> > > Well, not 'last' contention point, as we still hit remove_inode_hash(),
> > 
> > There should be no ned to put pipe or anon inodes on the inode hash.
> > Probably sockets don't need it either, but I'd need to look at it in
> > detail.
> > 
> > > inode_wb_list_del()
> > 
> > The should never be on the wb list either, doing an unlocked check for
> > actually beeing on the list before taking the lock should help you.
> 
> Yes, it might even help regular inodes ;)
> 
> > 
> > > inode_lru_list_del(),
> > 
> > No real need to keep inodes in the LRU if we only allocate them using
> > new_inode but never look them up either.  You might want to try setting
> > .drop_inode to generic_delete_inode for these.
> 
> Yes, I'll take a look, thanks.

If I am not mistaken, we can add unlocked checks on the three hot spots.

After following patch, a close(socket(PF_INET, SOCK_DGRAM, 0)) pair on
my dev machine takes ~3us instead of ~9us.

Maybe its better to split it in three patches, just let me know.

22us -> 3us, thats a nice patch series ;)

Thanks

[PATCH] vfs: avoid taking locks if inode not in lists

sockets and pipes inodes destruction hits three possibly contended
locks :

system-wide inode_hash_lock in remove_inode_hash()
superblock s_inode_lru_lock in inode_lru_list_del()
bdi wb.list_lock in inode_wb_list_del()

Before even taking locks, we can perform an unlocked test to check if
inode can possibly be in the lists.

On a 2x4x2 machine, a close(socket()) pair can be 200% faster with these
changes.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 fs/fs-writeback.c |   10 ++++++----
 fs/inode.c        |    6 ++++++
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1599aa9..8b90bdb 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -182,11 +182,13 @@ void bdi_start_background_writeback(struct backing_dev_info *bdi)
  */
 void inode_wb_list_del(struct inode *inode)
 {
-	struct backing_dev_info *bdi = inode_to_bdi(inode);
+	if (!list_empty(&inode->i_wb_list)) {
+		struct backing_dev_info *bdi = inode_to_bdi(inode);
 
-	spin_lock(&bdi->wb.list_lock);
-	list_del_init(&inode->i_wb_list);
-	spin_unlock(&bdi->wb.list_lock);
+		spin_lock(&bdi->wb.list_lock);
+		list_del_init(&inode->i_wb_list);
+		spin_unlock(&bdi->wb.list_lock);
+	}
 }
 
 /*
diff --git a/fs/inode.c b/fs/inode.c
index d0c72ff..796a420 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -338,6 +338,9 @@ static void inode_lru_list_add(struct inode *inode)
 
 static void inode_lru_list_del(struct inode *inode)
 {
+	if (list_empty(&inode->i_lru))
+		return;
+
 	spin_lock(&inode->i_sb->s_inode_lru_lock);
 	if (!list_empty(&inode->i_lru)) {
 		list_del_init(&inode->i_lru);
@@ -406,6 +409,9 @@ EXPORT_SYMBOL(__insert_inode_hash);
  */
 void remove_inode_hash(struct inode *inode)
 {
+	if (inode_unhashed(inode))
+		return;
+
 	spin_lock(&inode_hash_lock);
 	spin_lock(&inode->i_lock);
 	hlist_del_init(&inode->i_hash);


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] vfs: avoid taking locks if inode not in lists
  2011-07-27 15:21                               ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
@ 2011-07-27 17:12                                 ` Andi Kleen
  2011-07-27 20:44                                 ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Andi Kleen @ 2011-07-27 17:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen,
	Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
	linux-fsdevel, netdev

> Before even taking locks, we can perform an unlocked test to check if
> inode can possibly be in the lists.
> 
> On a 2x4x2 machine, a close(socket()) pair can be 200% faster with these
> changes.

Great result! Seems simple and obvious to me.

-Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] vfs: avoid taking locks if inode not in lists
  2011-07-27 15:21                               ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
  2011-07-27 17:12                                 ` Andi Kleen
@ 2011-07-27 20:44                                 ` Christoph Hellwig
  2011-07-27 20:59                                   ` Andi Kleen
                                                     ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-07-27 20:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen,
	Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
	linux-fsdevel, netdev

On Wed, Jul 27, 2011 at 05:21:05PM +0200, Eric Dumazet wrote:
> If I am not mistaken, we can add unlocked checks on the three hot spots.
> 
> After following patch, a close(socket(PF_INET, SOCK_DGRAM, 0)) pair on
> my dev machine takes ~3us instead of ~9us.
> 
> Maybe its better to split it in three patches, just let me know.

I think three patches would be a lot cleaner.

As for safety of the unlocked checks:

 - inode are either hashed when created or never, so that one looks
   fine.
 - same for the sb list.
 - the writeback list is a bit more dynamic as we move things around
   quite a bit.  But in additon to the inode_wb_list_del call from
   evict() it only ever gets remove in writeback_single_inode, which
   for a freeing inode can only be called from the callers of evict().

Btw, I wonder if you should micro-optimize things a bit further by
moving the unhashed checks from the deletion functions into the callers
and thus save a function call for each of them.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] vfs: avoid taking locks if inode not in lists
  2011-07-27 20:44                                 ` Christoph Hellwig
@ 2011-07-27 20:59                                   ` Andi Kleen
  2011-07-27 21:01                                     ` Christoph Hellwig
  2011-07-28  4:41                                   ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
  2011-07-28  4:55                                   ` [PATCH] vfs: avoid call to inode_lru_list_del() if possible Eric Dumazet
  2 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2011-07-27 20:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Dumazet, Tim Chen, Al Viro, David Miller, Andi Kleen,
	Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
	linux-fsdevel, netdev

> Btw, I wonder if you should micro-optimize things a bit further by
> moving the unhashed checks from the deletion functions into the callers
> and thus save a function call for each of them.

If the caller is in the same file modern gcc is able to do that automatically
if you're lucky enough ("partial inlining")

I would not uglify the code for it.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] vfs: avoid taking locks if inode not in lists
  2011-07-27 20:59                                   ` Andi Kleen
@ 2011-07-27 21:01                                     ` Christoph Hellwig
  2011-07-28  4:11                                       ` [PATCH] vfs: conditionally call inode_wb_list_del() Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2011-07-27 21:01 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Hellwig, Eric Dumazet, Tim Chen, Al Viro, David Miller,
	Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
	linux-fsdevel, netdev

On Wed, Jul 27, 2011 at 10:59:57PM +0200, Andi Kleen wrote:
> > Btw, I wonder if you should micro-optimize things a bit further by
> > moving the unhashed checks from the deletion functions into the callers
> > and thus save a function call for each of them.
> 
> If the caller is in the same file modern gcc is able to do that automatically
> if you're lucky enough ("partial inlining")
> 
> I would not uglify the code for it.

Depending on how you look at it the code might actually be a tad
cleaner.  One of called functions is outside of inode.c.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] vfs: conditionally call inode_wb_list_del()
  2011-07-27 21:01                                     ` Christoph Hellwig
@ 2011-07-28  4:11                                       ` Eric Dumazet
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2011-07-28  4:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andi Kleen, Tim Chen, Al Viro, David Miller, Matthew Wilcox,
	Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev

Le mercredi 27 juillet 2011 à 17:01 -0400, Christoph Hellwig a écrit :
> On Wed, Jul 27, 2011 at 10:59:57PM +0200, Andi Kleen wrote:
> > > Btw, I wonder if you should micro-optimize things a bit further by
> > > moving the unhashed checks from the deletion functions into the callers
> > > and thus save a function call for each of them.
> > 
> > If the caller is in the same file modern gcc is able to do that automatically
> > if you're lucky enough ("partial inlining")
> > 
> > I would not uglify the code for it.
> 
> Depending on how you look at it the code might actually be a tad
> cleaner.  One of called functions is outside of inode.c.
> 

Thats right, thanks again for your valuable input Christoph.

The following is a clear win, since we avoid the call to external
function.

[PATCH] vfs: conditionally call inode_wb_list_del()

Some inodes (pipes, sockets, ...) are not in bdi writeback list.

evict() can avoid calling inode_wb_list_del() and its expensive spinlock
by checking inode i_wb_list being empty or not.

At this point, no other cpu/user can concurrently manipulate this inode
i_wb_list

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 fs/inode.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/inode.c b/fs/inode.c
index d0c72ff..9dab13a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -454,7 +454,9 @@ static void evict(struct inode *inode)
 	BUG_ON(!(inode->i_state & I_FREEING));
 	BUG_ON(!list_empty(&inode->i_lru));
 
-	inode_wb_list_del(inode);
+	if (!list_empty(&inode->i_wb_list))
+		inode_wb_list_del(inode);
+
 	inode_sb_list_del(inode);
 
 	if (op->evict_inode) {


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] vfs: avoid taking locks if inode not in lists
  2011-07-27 20:44                                 ` Christoph Hellwig
  2011-07-27 20:59                                   ` Andi Kleen
@ 2011-07-28  4:41                                   ` Eric Dumazet
  2011-07-28  4:55                                   ` [PATCH] vfs: avoid call to inode_lru_list_del() if possible Eric Dumazet
  2 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2011-07-28  4:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox,
	Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev

Le mercredi 27 juillet 2011 à 16:44 -0400, Christoph Hellwig a écrit :
> On Wed, Jul 27, 2011 at 05:21:05PM +0200, Eric Dumazet wrote:
> > If I am not mistaken, we can add unlocked checks on the three hot spots.
> > 
> > After following patch, a close(socket(PF_INET, SOCK_DGRAM, 0)) pair on
> > my dev machine takes ~3us instead of ~9us.
> > 
> > Maybe its better to split it in three patches, just let me know.
> 
> I think three patches would be a lot cleaner.
> 
> As for safety of the unlocked checks:
> 
>  - inode are either hashed when created or never, so that one looks
>    fine.
>  - same for the sb list.
>  - the writeback list is a bit more dynamic as we move things around
>    quite a bit.  But in additon to the inode_wb_list_del call from
>    evict() it only ever gets remove in writeback_single_inode, which
>    for a freeing inode can only be called from the callers of evict().
> 
> Btw, I wonder if you should micro-optimize things a bit further by
> moving the unhashed checks from the deletion functions into the callers
> and thus save a function call for each of them.
> 

What about following patch, addressing the micro-optimization and Andi
Kleen concern about evict() readability ?

Thanks !

[PATCH] vfs: avoid taking inode_hash_lock on pipes and sockets

Some inodes (pipes, sockets, ...) are not hashed, no need to take
contended inode_hash_lock at dismantle time.

nice speedup on SMP machines on socket intensive workloads.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 fs/inode.c         |    6 +++---
 include/linux/fs.h |    9 ++++++++-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index d0c72ff..73b5598 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -399,12 +399,12 @@ void __insert_inode_hash(struct inode *inode, unsigned long hashval)
 EXPORT_SYMBOL(__insert_inode_hash);
 
 /**
- *	remove_inode_hash - remove an inode from the hash
+ *	__remove_inode_hash - remove an inode from the hash
  *	@inode: inode to unhash
  *
  *	Remove an inode from the superblock.
  */
-void remove_inode_hash(struct inode *inode)
+void __remove_inode_hash(struct inode *inode)
 {
 	spin_lock(&inode_hash_lock);
 	spin_lock(&inode->i_lock);
@@ -412,7 +412,7 @@ void remove_inode_hash(struct inode *inode)
 	spin_unlock(&inode->i_lock);
 	spin_unlock(&inode_hash_lock);
 }
-EXPORT_SYMBOL(remove_inode_hash);
+EXPORT_SYMBOL(__remove_inode_hash);
 
 void end_writeback(struct inode *inode)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f23bcb7..786b3b1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2317,11 +2317,18 @@ extern int should_remove_suid(struct dentry *);
 extern int file_remove_suid(struct file *);
 
 extern void __insert_inode_hash(struct inode *, unsigned long hashval);
-extern void remove_inode_hash(struct inode *);
 static inline void insert_inode_hash(struct inode *inode)
 {
 	__insert_inode_hash(inode, inode->i_ino);
 }
+
+extern void __remove_inode_hash(struct inode *);
+static inline void remove_inode_hash(struct inode *inode)
+{
+	if (!inode_unhashed(inode))
+		__remove_inode_hash(inode);
+}
+
 extern void inode_sb_list_add(struct inode *inode);
 
 #ifdef CONFIG_BLOCK


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH] vfs: avoid call to inode_lru_list_del() if possible
  2011-07-27 20:44                                 ` Christoph Hellwig
  2011-07-27 20:59                                   ` Andi Kleen
  2011-07-28  4:41                                   ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
@ 2011-07-28  4:55                                   ` Eric Dumazet
  2 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2011-07-28  4:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox,
	Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev

Le mercredi 27 juillet 2011 à 16:44 -0400, Christoph Hellwig a écrit :
> On Wed, Jul 27, 2011 at 05:21:05PM +0200, Eric Dumazet wrote:
> > If I am not mistaken, we can add unlocked checks on the three hot spots.
> > 
> > After following patch, a close(socket(PF_INET, SOCK_DGRAM, 0)) pair on
> > my dev machine takes ~3us instead of ~9us.
> > 
> > Maybe its better to split it in three patches, just let me know.
> 
> I think three patches would be a lot cleaner.
> 
> As for safety of the unlocked checks:
> 
>  - inode are either hashed when created or never, so that one looks
>    fine.
>  - same for the sb list.
>  - the writeback list is a bit more dynamic as we move things around
>    quite a bit.  But in additon to the inode_wb_list_del call from
>    evict() it only ever gets remove in writeback_single_inode, which
>    for a freeing inode can only be called from the callers of evict().
> 
> Btw, I wonder if you should micro-optimize things a bit further by
> moving the unhashed checks from the deletion functions into the callers
> and thus save a function call for each of them.
> 

Here is the last patch, addressing inode_lru_list_del() call.

Only the call done from iput_final() can obviously benefit from checking
i_lru being empty or not, so it makes sense to perform the check at
caller site instead of doing it in inode_lru_list_del()

[PATCH] vfs: avoid call to inode_lru_list_del() if possible

inode_lru_list_del() is expensive because of per superblock lru locking,
while some inodes are not in lru list.

Adding a check in iput_final() can speedup pipe/sockets workloads on
SMP.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 fs/inode.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/inode.c b/fs/inode.c
index d0c72ff..b8b8939 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1328,7 +1328,8 @@ static void iput_final(struct inode *inode)
 	}
 
 	inode->i_state |= I_FREEING;
-	inode_lru_list_del(inode);
+	if (!list_empty(&inode->i_lru))
+		inode_lru_list_del(inode);
 	spin_unlock(&inode->i_lock);
 
 	evict(inode);


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-07-28  4:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20110717010427.GC5359@parisc-linux.org>
     [not found] ` <20110717084630.GC8006@one.firstfloor.org>
     [not found]   ` <1310977028.5756.1.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>
     [not found]     ` <20110718154008.GA29593@infradead.org>
     [not found]       ` <20110718155141.GA11013@ZenIV.linux.org.uk>
     [not found]         ` <1311093158.2707.75.camel@schen9-DESK>
     [not found]           ` <20110721204042.GB31405@ZenIV.linux.org.uk>
     [not found]             ` <1311294452.2576.18.camel@schen9-DESK>
     [not found]               ` <20110723132411.GA22183@infradead.org>
     [not found]                 ` <1311633550.2576.33.camel@schen9-DESK>
     [not found]                   ` <20110725225154.GD22133@ZenIV.linux.org.uk>
     [not found]                     ` <1311636178.2576.34.camel@schen9-DESK>
     [not found]                       ` <1311660013.2996.6.camel@edumazet-laptop>
2011-07-26  8:21                         ` [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list Eric Dumazet
2011-07-26  9:03                           ` Christoph Hellwig
2011-07-26  9:36                             ` Eric Dumazet
2011-07-26  9:42                               ` Christoph Hellwig
2011-07-26 10:43                                 ` Eric Dumazet
2011-07-26 11:49                                   ` Christoph Hellwig
2011-07-27 15:21                               ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
2011-07-27 17:12                                 ` Andi Kleen
2011-07-27 20:44                                 ` Christoph Hellwig
2011-07-27 20:59                                   ` Andi Kleen
2011-07-27 21:01                                     ` Christoph Hellwig
2011-07-28  4:11                                       ` [PATCH] vfs: conditionally call inode_wb_list_del() Eric Dumazet
2011-07-28  4:41                                   ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
2011-07-28  4:55                                   ` [PATCH] vfs: avoid call to inode_lru_list_del() if possible Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).