* [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
[not found] ` <1311660013.2996.6.camel@edumazet-laptop>
@ 2011-07-26 8:21 ` Eric Dumazet
2011-07-26 9:03 ` Christoph Hellwig
0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2011-07-26 8:21 UTC (permalink / raw)
To: Tim Chen, Al Viro, David Miller
Cc: Christoph Hellwig, Andi Kleen, Matthew Wilcox, Anton Blanchard,
npiggin, linux-kernel, linux-fsdevel, netdev
Le mardi 26 juillet 2011 à 08:00 +0200, Eric Dumazet a écrit :
> Next step is to not chain pipes/sockets into superblock s_inodes list
>
> inode_sb_list_add()/inode_sb_list_del() is the very last contention
> point because of spin_lock(&inode_sb_list_lock);
Well, not 'last' contention point, as we still hit remove_inode_hash(),
inode_wb_list_del(), inode_lru_list_del(), but thats a clear win on my
2x4x2 machine : 9 seconds instead of 22 on a close(socket()) benchmark.
[PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
Workloads using pipes and sockets hit inode_sb_list_lock contention.
superblock s_inodes list is needed for quota, dirty, pagecache and
fsnotify management. pipe/anon/socket fs are clearly not candidates for
these.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
fs/anon_inodes.c | 2 +-
fs/inode.c | 31 +++++++++++++++++++++----------
fs/pipe.c | 2 +-
include/linux/fs.h | 3 ++-
net/socket.c | 2 +-
5 files changed, 26 insertions(+), 14 deletions(-)
diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 4d433d3..269499e 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -187,7 +187,7 @@ EXPORT_SYMBOL_GPL(anon_inode_getfd);
*/
static struct inode *anon_inode_mkinode(void)
{
- struct inode *inode = new_inode(anon_inode_mnt->mnt_sb);
+ struct inode *inode = __new_inode(anon_inode_mnt->mnt_sb);
if (!inode)
return ERR_PTR(-ENOMEM);
diff --git a/fs/inode.c b/fs/inode.c
index 96c77b8..8a6d62b 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -362,9 +362,11 @@ EXPORT_SYMBOL_GPL(inode_sb_list_add);
static inline void inode_sb_list_del(struct inode *inode)
{
- spin_lock(&inode_sb_list_lock);
- list_del_init(&inode->i_sb_list);
- spin_unlock(&inode_sb_list_lock);
+ if (!list_empty(&inode->i_sb_list)) {
+ spin_lock(&inode_sb_list_lock);
+ list_del_init(&inode->i_sb_list);
+ spin_unlock(&inode_sb_list_lock);
+ }
}
static unsigned long hash(struct super_block *sb, unsigned long hashval)
@@ -796,6 +798,19 @@ unsigned int get_next_ino(void)
}
EXPORT_SYMBOL(get_next_ino);
+struct inode *__new_inode(struct super_block *sb)
+{
+ struct inode *inode = alloc_inode(sb);
+
+ if (inode) {
+ spin_lock(&inode->i_lock);
+ inode->i_state = 0;
+ spin_unlock(&inode->i_lock);
+ INIT_LIST_HEAD(&inode->i_sb_list);
+ }
+ return inode;
+}
+
/**
* new_inode - obtain an inode
* @sb: superblock
@@ -814,13 +829,9 @@ struct inode *new_inode(struct super_block *sb)
spin_lock_prefetch(&inode_sb_list_lock);
- inode = alloc_inode(sb);
- if (inode) {
- spin_lock(&inode->i_lock);
- inode->i_state = 0;
- spin_unlock(&inode->i_lock);
- inode_sb_list_add(inode);
- }
+ inode = __new_inode(sb);
+ if (inode)
+ inode_sb_list_add(inode);
return inode;
}
EXPORT_SYMBOL(new_inode);
diff --git a/fs/pipe.c b/fs/pipe.c
index 1b7f9af..937b962 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -948,7 +948,7 @@ static const struct dentry_operations pipefs_dentry_operations = {
static struct inode * get_pipe_inode(void)
{
- struct inode *inode = new_inode(pipe_mnt->mnt_sb);
+ struct inode *inode = __new_inode(pipe_mnt->mnt_sb);
struct pipe_inode_info *pipe;
if (!inode)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a665804..60be54f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2310,7 +2310,8 @@ extern void __iget(struct inode * inode);
extern void iget_failed(struct inode *);
extern void end_writeback(struct inode *);
extern void __destroy_inode(struct inode *);
-extern struct inode *new_inode(struct super_block *);
+extern struct inode *__new_inode(struct super_block *sb);
+extern struct inode *new_inode(struct super_block *sb);
extern void free_inode_nonrcu(struct inode *inode);
extern int should_remove_suid(struct dentry *);
extern int file_remove_suid(struct file *);
diff --git a/net/socket.c b/net/socket.c
index 02dc82d..b4b8a08 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -467,7 +467,7 @@ static struct socket *sock_alloc(void)
struct inode *inode;
struct socket *sock;
- inode = new_inode(sock_mnt->mnt_sb);
+ inode = __new_inode(sock_mnt->mnt_sb);
if (!inode)
return NULL;
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
2011-07-26 8:21 ` [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list Eric Dumazet
@ 2011-07-26 9:03 ` Christoph Hellwig
2011-07-26 9:36 ` Eric Dumazet
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2011-07-26 9:03 UTC (permalink / raw)
To: Eric Dumazet
Cc: Tim Chen, Al Viro, David Miller, Christoph Hellwig, Andi Kleen,
Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
linux-fsdevel, netdev
On Tue, Jul 26, 2011 at 10:21:06AM +0200, Eric Dumazet wrote:
> Well, not 'last' contention point, as we still hit remove_inode_hash(),
There should be no ned to put pipe or anon inodes on the inode hash.
Probably sockets don't need it either, but I'd need to look at it in
detail.
> inode_wb_list_del()
The should never be on the wb list either, doing an unlocked check for
actually beeing on the list before taking the lock should help you.
> inode_lru_list_del(),
No real need to keep inodes in the LRU if we only allocate them using
new_inode but never look them up either. You might want to try setting
.drop_inode to generic_delete_inode for these.
> +struct inode *__new_inode(struct super_block *sb)
> +{
> + struct inode *inode = alloc_inode(sb);
> +
> + if (inode) {
> + spin_lock(&inode->i_lock);
> + inode->i_state = 0;
> + spin_unlock(&inode->i_lock);
> + INIT_LIST_HEAD(&inode->i_sb_list);
> + }
> + return inode;
> +}
This needs a much better name like new_inode_pseudo, and a kerneldoc
comment explaining when it is safe to use, and the consequences, which
appear to me:
- fs may never be unmount
- quotas can't work on the filesystem
- writeback can't work on the filesystem
> @@ -814,13 +829,9 @@ struct inode *new_inode(struct super_block *sb)
>
> spin_lock_prefetch(&inode_sb_list_lock);
>
> - inode = alloc_inode(sb);
> - if (inode) {
> - spin_lock(&inode->i_lock);
> - inode->i_state = 0;
> - spin_unlock(&inode->i_lock);
> - inode_sb_list_add(inode);
> - }
> + inode = __new_inode(sb);
> + if (inode)
> + inode_sb_list_add(inode);
bad indentation.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
2011-07-26 9:03 ` Christoph Hellwig
@ 2011-07-26 9:36 ` Eric Dumazet
2011-07-26 9:42 ` Christoph Hellwig
2011-07-27 15:21 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
0 siblings, 2 replies; 14+ messages in thread
From: Eric Dumazet @ 2011-07-26 9:36 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox,
Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev
Le mardi 26 juillet 2011 à 05:03 -0400, Christoph Hellwig a écrit :
> On Tue, Jul 26, 2011 at 10:21:06AM +0200, Eric Dumazet wrote:
> > Well, not 'last' contention point, as we still hit remove_inode_hash(),
>
> There should be no ned to put pipe or anon inodes on the inode hash.
> Probably sockets don't need it either, but I'd need to look at it in
> detail.
>
> > inode_wb_list_del()
>
> The should never be on the wb list either, doing an unlocked check for
> actually beeing on the list before taking the lock should help you.
Yes, it might even help regular inodes ;)
>
> > inode_lru_list_del(),
>
> No real need to keep inodes in the LRU if we only allocate them using
> new_inode but never look them up either. You might want to try setting
> .drop_inode to generic_delete_inode for these.
Yes, I'll take a look, thanks.
>
> > +struct inode *__new_inode(struct super_block *sb)
> > +{
> > + struct inode *inode = alloc_inode(sb);
> > +
> > + if (inode) {
> > + spin_lock(&inode->i_lock);
> > + inode->i_state = 0;
> > + spin_unlock(&inode->i_lock);
> > + INIT_LIST_HEAD(&inode->i_sb_list);
> > + }
> > + return inode;
> > +}
>
> This needs a much better name like new_inode_pseudo, and a kerneldoc
> comment explaining when it is safe to use, and the consequences, which
> appear to me:
>
> - fs may never be unmount
> - quotas can't work on the filesystem
> - writeback can't work on the filesystem
Thanks for reviewing, here is v2 of the patch, addressing your comments.
[PATCH v2] vfs: dont chain pipe/anon/socket on superblock s_inodes list
Workloads using pipes and sockets hit inode_sb_list_lock contention.
superblock s_inodes list is needed for quota, dirty, pagecache and
fsnotify management. pipe/anon/socket fs are clearly not candidates for
these.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
v2: address Christoph comments
fs/anon_inodes.c | 2 +-
fs/inode.c | 39 ++++++++++++++++++++++++++++++---------
fs/pipe.c | 2 +-
include/linux/fs.h | 3 ++-
net/socket.c | 2 +-
5 files changed, 35 insertions(+), 13 deletions(-)
diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 4d433d3..f11e43e 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -187,7 +187,7 @@ EXPORT_SYMBOL_GPL(anon_inode_getfd);
*/
static struct inode *anon_inode_mkinode(void)
{
- struct inode *inode = new_inode(anon_inode_mnt->mnt_sb);
+ struct inode *inode = new_inode_pseudo(anon_inode_mnt->mnt_sb);
if (!inode)
return ERR_PTR(-ENOMEM);
diff --git a/fs/inode.c b/fs/inode.c
index 96c77b8..319b93b 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -362,9 +362,11 @@ EXPORT_SYMBOL_GPL(inode_sb_list_add);
static inline void inode_sb_list_del(struct inode *inode)
{
- spin_lock(&inode_sb_list_lock);
- list_del_init(&inode->i_sb_list);
- spin_unlock(&inode_sb_list_lock);
+ if (!list_empty(&inode->i_sb_list)) {
+ spin_lock(&inode_sb_list_lock);
+ list_del_init(&inode->i_sb_list);
+ spin_unlock(&inode_sb_list_lock);
+ }
}
static unsigned long hash(struct super_block *sb, unsigned long hashval)
@@ -797,6 +799,29 @@ unsigned int get_next_ino(void)
EXPORT_SYMBOL(get_next_ino);
/**
+ * new_inode_pseudo - obtain an inode
+ * @sb: superblock
+ *
+ * Allocates a new inode for given superblock.
+ * Inode wont be chained in superblock s_inodes list
+ * This means :
+ * - fs can't be unmount
+ * - quotas, fsnotify, writeback can't work
+ */
+struct inode *new_inode_pseudo(struct super_block *sb)
+{
+ struct inode *inode = alloc_inode(sb);
+
+ if (inode) {
+ spin_lock(&inode->i_lock);
+ inode->i_state = 0;
+ spin_unlock(&inode->i_lock);
+ INIT_LIST_HEAD(&inode->i_sb_list);
+ }
+ return inode;
+}
+
+/**
* new_inode - obtain an inode
* @sb: superblock
*
@@ -814,13 +839,9 @@ struct inode *new_inode(struct super_block *sb)
spin_lock_prefetch(&inode_sb_list_lock);
- inode = alloc_inode(sb);
- if (inode) {
- spin_lock(&inode->i_lock);
- inode->i_state = 0;
- spin_unlock(&inode->i_lock);
+ inode = new_inode_pseudo(sb);
+ if (inode)
inode_sb_list_add(inode);
- }
return inode;
}
EXPORT_SYMBOL(new_inode);
diff --git a/fs/pipe.c b/fs/pipe.c
index 1b7f9af..0e0be1d 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -948,7 +948,7 @@ static const struct dentry_operations pipefs_dentry_operations = {
static struct inode * get_pipe_inode(void)
{
- struct inode *inode = new_inode(pipe_mnt->mnt_sb);
+ struct inode *inode = new_inode_pseudo(pipe_mnt->mnt_sb);
struct pipe_inode_info *pipe;
if (!inode)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a665804..cc363fa 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2310,7 +2310,8 @@ extern void __iget(struct inode * inode);
extern void iget_failed(struct inode *);
extern void end_writeback(struct inode *);
extern void __destroy_inode(struct inode *);
-extern struct inode *new_inode(struct super_block *);
+extern struct inode *new_inode_pseudo(struct super_block *sb);
+extern struct inode *new_inode(struct super_block *sb);
extern void free_inode_nonrcu(struct inode *inode);
extern int should_remove_suid(struct dentry *);
extern int file_remove_suid(struct file *);
diff --git a/net/socket.c b/net/socket.c
index 02dc82d..26ed35c 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -467,7 +467,7 @@ static struct socket *sock_alloc(void)
struct inode *inode;
struct socket *sock;
- inode = new_inode(sock_mnt->mnt_sb);
+ inode = new_inode_pseudo(sock_mnt->mnt_sb);
if (!inode)
return NULL;
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
2011-07-26 9:36 ` Eric Dumazet
@ 2011-07-26 9:42 ` Christoph Hellwig
2011-07-26 10:43 ` Eric Dumazet
2011-07-27 15:21 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2011-07-26 9:42 UTC (permalink / raw)
To: Eric Dumazet
Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen,
Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
linux-fsdevel, netdev
On Tue, Jul 26, 2011 at 11:36:34AM +0200, Eric Dumazet wrote:
> [PATCH v2] vfs: dont chain pipe/anon/socket on superblock s_inodes list
>
> Workloads using pipes and sockets hit inode_sb_list_lock contention.
>
> superblock s_inodes list is needed for quota, dirty, pagecache and
> fsnotify management. pipe/anon/socket fs are clearly not candidates for
> these.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Looks good to me,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
2011-07-26 9:42 ` Christoph Hellwig
@ 2011-07-26 10:43 ` Eric Dumazet
2011-07-26 11:49 ` Christoph Hellwig
0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2011-07-26 10:43 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox,
Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev
Le mardi 26 juillet 2011 à 05:42 -0400, Christoph Hellwig a écrit :
> On Tue, Jul 26, 2011 at 11:36:34AM +0200, Eric Dumazet wrote:
> > [PATCH v2] vfs: dont chain pipe/anon/socket on superblock s_inodes list
> >
> > Workloads using pipes and sockets hit inode_sb_list_lock contention.
> >
> > superblock s_inodes list is needed for quota, dirty, pagecache and
> > fsnotify management. pipe/anon/socket fs are clearly not candidates for
> > these.
> >
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> Looks good to me,
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
>
Thanks !
BTW, we have one atomic op that could be avoided in new_inode()
spin_lock(&inode->i_lock);
inode->i_state = 0;
spin_unlock(&inode->i_lock);
can probably be changed to something less expensive...
inode->i_state = 0;
smp_wmb();
Not clear if we really need a memory barrier either....
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
2011-07-26 10:43 ` Eric Dumazet
@ 2011-07-26 11:49 ` Christoph Hellwig
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-07-26 11:49 UTC (permalink / raw)
To: Eric Dumazet
Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen,
Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
linux-fsdevel, netdev
On Tue, Jul 26, 2011 at 12:43:33PM +0200, Eric Dumazet wrote:
> BTW, we have one atomic op that could be avoided in new_inode()
>
> spin_lock(&inode->i_lock);
> inode->i_state = 0;
> spin_unlock(&inode->i_lock);
>
> can probably be changed to something less expensive...
>
> inode->i_state = 0;
> smp_wmb();
>
> Not clear if we really need a memory barrier either....
I think we already had this in some of the earlier vfs/inode scale
series, but it got lost when Al asked to just put the fundamental
changes in.
For plain new_inode() the barrier shouldn't be needed as we take
the sb list lock just a little later. I'm not sure about your new
variant, so I'll rather lave that to you.
There's a few other things missing from earlier iterations, most notable
the non-atomic i_count, and the bucket locks for the inode hash, if
you're eager enough to look into that area.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] vfs: avoid taking locks if inode not in lists
2011-07-26 9:36 ` Eric Dumazet
2011-07-26 9:42 ` Christoph Hellwig
@ 2011-07-27 15:21 ` Eric Dumazet
2011-07-27 17:12 ` Andi Kleen
2011-07-27 20:44 ` Christoph Hellwig
1 sibling, 2 replies; 14+ messages in thread
From: Eric Dumazet @ 2011-07-27 15:21 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox,
Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev
Le mardi 26 juillet 2011 à 11:36 +0200, Eric Dumazet a écrit :
> Le mardi 26 juillet 2011 à 05:03 -0400, Christoph Hellwig a écrit :
> > On Tue, Jul 26, 2011 at 10:21:06AM +0200, Eric Dumazet wrote:
> > > Well, not 'last' contention point, as we still hit remove_inode_hash(),
> >
> > There should be no ned to put pipe or anon inodes on the inode hash.
> > Probably sockets don't need it either, but I'd need to look at it in
> > detail.
> >
> > > inode_wb_list_del()
> >
> > The should never be on the wb list either, doing an unlocked check for
> > actually beeing on the list before taking the lock should help you.
>
> Yes, it might even help regular inodes ;)
>
> >
> > > inode_lru_list_del(),
> >
> > No real need to keep inodes in the LRU if we only allocate them using
> > new_inode but never look them up either. You might want to try setting
> > .drop_inode to generic_delete_inode for these.
>
> Yes, I'll take a look, thanks.
If I am not mistaken, we can add unlocked checks on the three hot spots.
After following patch, a close(socket(PF_INET, SOCK_DGRAM, 0)) pair on
my dev machine takes ~3us instead of ~9us.
Maybe its better to split it in three patches, just let me know.
22us -> 3us, thats a nice patch series ;)
Thanks
[PATCH] vfs: avoid taking locks if inode not in lists
sockets and pipes inodes destruction hits three possibly contended
locks :
system-wide inode_hash_lock in remove_inode_hash()
superblock s_inode_lru_lock in inode_lru_list_del()
bdi wb.list_lock in inode_wb_list_del()
Before even taking locks, we can perform an unlocked test to check if
inode can possibly be in the lists.
On a 2x4x2 machine, a close(socket()) pair can be 200% faster with these
changes.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
fs/fs-writeback.c | 10 ++++++----
fs/inode.c | 6 ++++++
2 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1599aa9..8b90bdb 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -182,11 +182,13 @@ void bdi_start_background_writeback(struct backing_dev_info *bdi)
*/
void inode_wb_list_del(struct inode *inode)
{
- struct backing_dev_info *bdi = inode_to_bdi(inode);
+ if (!list_empty(&inode->i_wb_list)) {
+ struct backing_dev_info *bdi = inode_to_bdi(inode);
- spin_lock(&bdi->wb.list_lock);
- list_del_init(&inode->i_wb_list);
- spin_unlock(&bdi->wb.list_lock);
+ spin_lock(&bdi->wb.list_lock);
+ list_del_init(&inode->i_wb_list);
+ spin_unlock(&bdi->wb.list_lock);
+ }
}
/*
diff --git a/fs/inode.c b/fs/inode.c
index d0c72ff..796a420 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -338,6 +338,9 @@ static void inode_lru_list_add(struct inode *inode)
static void inode_lru_list_del(struct inode *inode)
{
+ if (list_empty(&inode->i_lru))
+ return;
+
spin_lock(&inode->i_sb->s_inode_lru_lock);
if (!list_empty(&inode->i_lru)) {
list_del_init(&inode->i_lru);
@@ -406,6 +409,9 @@ EXPORT_SYMBOL(__insert_inode_hash);
*/
void remove_inode_hash(struct inode *inode)
{
+ if (inode_unhashed(inode))
+ return;
+
spin_lock(&inode_hash_lock);
spin_lock(&inode->i_lock);
hlist_del_init(&inode->i_hash);
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: avoid taking locks if inode not in lists
2011-07-27 15:21 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
@ 2011-07-27 17:12 ` Andi Kleen
2011-07-27 20:44 ` Christoph Hellwig
1 sibling, 0 replies; 14+ messages in thread
From: Andi Kleen @ 2011-07-27 17:12 UTC (permalink / raw)
To: Eric Dumazet
Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen,
Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
linux-fsdevel, netdev
> Before even taking locks, we can perform an unlocked test to check if
> inode can possibly be in the lists.
>
> On a 2x4x2 machine, a close(socket()) pair can be 200% faster with these
> changes.
Great result! Seems simple and obvious to me.
-Andi
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: avoid taking locks if inode not in lists
2011-07-27 15:21 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
2011-07-27 17:12 ` Andi Kleen
@ 2011-07-27 20:44 ` Christoph Hellwig
2011-07-27 20:59 ` Andi Kleen
` (2 more replies)
1 sibling, 3 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-07-27 20:44 UTC (permalink / raw)
To: Eric Dumazet
Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen,
Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
linux-fsdevel, netdev
On Wed, Jul 27, 2011 at 05:21:05PM +0200, Eric Dumazet wrote:
> If I am not mistaken, we can add unlocked checks on the three hot spots.
>
> After following patch, a close(socket(PF_INET, SOCK_DGRAM, 0)) pair on
> my dev machine takes ~3us instead of ~9us.
>
> Maybe its better to split it in three patches, just let me know.
I think three patches would be a lot cleaner.
As for safety of the unlocked checks:
- inode are either hashed when created or never, so that one looks
fine.
- same for the sb list.
- the writeback list is a bit more dynamic as we move things around
quite a bit. But in additon to the inode_wb_list_del call from
evict() it only ever gets remove in writeback_single_inode, which
for a freeing inode can only be called from the callers of evict().
Btw, I wonder if you should micro-optimize things a bit further by
moving the unhashed checks from the deletion functions into the callers
and thus save a function call for each of them.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: avoid taking locks if inode not in lists
2011-07-27 20:44 ` Christoph Hellwig
@ 2011-07-27 20:59 ` Andi Kleen
2011-07-27 21:01 ` Christoph Hellwig
2011-07-28 4:41 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
2011-07-28 4:55 ` [PATCH] vfs: avoid call to inode_lru_list_del() if possible Eric Dumazet
2 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2011-07-27 20:59 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Eric Dumazet, Tim Chen, Al Viro, David Miller, Andi Kleen,
Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
linux-fsdevel, netdev
> Btw, I wonder if you should micro-optimize things a bit further by
> moving the unhashed checks from the deletion functions into the callers
> and thus save a function call for each of them.
If the caller is in the same file modern gcc is able to do that automatically
if you're lucky enough ("partial inlining")
I would not uglify the code for it.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: avoid taking locks if inode not in lists
2011-07-27 20:59 ` Andi Kleen
@ 2011-07-27 21:01 ` Christoph Hellwig
2011-07-28 4:11 ` [PATCH] vfs: conditionally call inode_wb_list_del() Eric Dumazet
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2011-07-27 21:01 UTC (permalink / raw)
To: Andi Kleen
Cc: Christoph Hellwig, Eric Dumazet, Tim Chen, Al Viro, David Miller,
Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
linux-fsdevel, netdev
On Wed, Jul 27, 2011 at 10:59:57PM +0200, Andi Kleen wrote:
> > Btw, I wonder if you should micro-optimize things a bit further by
> > moving the unhashed checks from the deletion functions into the callers
> > and thus save a function call for each of them.
>
> If the caller is in the same file modern gcc is able to do that automatically
> if you're lucky enough ("partial inlining")
>
> I would not uglify the code for it.
Depending on how you look at it the code might actually be a tad
cleaner. One of called functions is outside of inode.c.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] vfs: conditionally call inode_wb_list_del()
2011-07-27 21:01 ` Christoph Hellwig
@ 2011-07-28 4:11 ` Eric Dumazet
0 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2011-07-28 4:11 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andi Kleen, Tim Chen, Al Viro, David Miller, Matthew Wilcox,
Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev
Le mercredi 27 juillet 2011 à 17:01 -0400, Christoph Hellwig a écrit :
> On Wed, Jul 27, 2011 at 10:59:57PM +0200, Andi Kleen wrote:
> > > Btw, I wonder if you should micro-optimize things a bit further by
> > > moving the unhashed checks from the deletion functions into the callers
> > > and thus save a function call for each of them.
> >
> > If the caller is in the same file modern gcc is able to do that automatically
> > if you're lucky enough ("partial inlining")
> >
> > I would not uglify the code for it.
>
> Depending on how you look at it the code might actually be a tad
> cleaner. One of called functions is outside of inode.c.
>
Thats right, thanks again for your valuable input Christoph.
The following is a clear win, since we avoid the call to external
function.
[PATCH] vfs: conditionally call inode_wb_list_del()
Some inodes (pipes, sockets, ...) are not in bdi writeback list.
evict() can avoid calling inode_wb_list_del() and its expensive spinlock
by checking inode i_wb_list being empty or not.
At this point, no other cpu/user can concurrently manipulate this inode
i_wb_list
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
fs/inode.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/inode.c b/fs/inode.c
index d0c72ff..9dab13a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -454,7 +454,9 @@ static void evict(struct inode *inode)
BUG_ON(!(inode->i_state & I_FREEING));
BUG_ON(!list_empty(&inode->i_lru));
- inode_wb_list_del(inode);
+ if (!list_empty(&inode->i_wb_list))
+ inode_wb_list_del(inode);
+
inode_sb_list_del(inode);
if (op->evict_inode) {
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: avoid taking locks if inode not in lists
2011-07-27 20:44 ` Christoph Hellwig
2011-07-27 20:59 ` Andi Kleen
@ 2011-07-28 4:41 ` Eric Dumazet
2011-07-28 4:55 ` [PATCH] vfs: avoid call to inode_lru_list_del() if possible Eric Dumazet
2 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2011-07-28 4:41 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox,
Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev
Le mercredi 27 juillet 2011 à 16:44 -0400, Christoph Hellwig a écrit :
> On Wed, Jul 27, 2011 at 05:21:05PM +0200, Eric Dumazet wrote:
> > If I am not mistaken, we can add unlocked checks on the three hot spots.
> >
> > After following patch, a close(socket(PF_INET, SOCK_DGRAM, 0)) pair on
> > my dev machine takes ~3us instead of ~9us.
> >
> > Maybe its better to split it in three patches, just let me know.
>
> I think three patches would be a lot cleaner.
>
> As for safety of the unlocked checks:
>
> - inode are either hashed when created or never, so that one looks
> fine.
> - same for the sb list.
> - the writeback list is a bit more dynamic as we move things around
> quite a bit. But in additon to the inode_wb_list_del call from
> evict() it only ever gets remove in writeback_single_inode, which
> for a freeing inode can only be called from the callers of evict().
>
> Btw, I wonder if you should micro-optimize things a bit further by
> moving the unhashed checks from the deletion functions into the callers
> and thus save a function call for each of them.
>
What about following patch, addressing the micro-optimization and Andi
Kleen concern about evict() readability ?
Thanks !
[PATCH] vfs: avoid taking inode_hash_lock on pipes and sockets
Some inodes (pipes, sockets, ...) are not hashed, no need to take
contended inode_hash_lock at dismantle time.
nice speedup on SMP machines on socket intensive workloads.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
fs/inode.c | 6 +++---
include/linux/fs.h | 9 ++++++++-
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index d0c72ff..73b5598 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -399,12 +399,12 @@ void __insert_inode_hash(struct inode *inode, unsigned long hashval)
EXPORT_SYMBOL(__insert_inode_hash);
/**
- * remove_inode_hash - remove an inode from the hash
+ * __remove_inode_hash - remove an inode from the hash
* @inode: inode to unhash
*
* Remove an inode from the superblock.
*/
-void remove_inode_hash(struct inode *inode)
+void __remove_inode_hash(struct inode *inode)
{
spin_lock(&inode_hash_lock);
spin_lock(&inode->i_lock);
@@ -412,7 +412,7 @@ void remove_inode_hash(struct inode *inode)
spin_unlock(&inode->i_lock);
spin_unlock(&inode_hash_lock);
}
-EXPORT_SYMBOL(remove_inode_hash);
+EXPORT_SYMBOL(__remove_inode_hash);
void end_writeback(struct inode *inode)
{
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f23bcb7..786b3b1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2317,11 +2317,18 @@ extern int should_remove_suid(struct dentry *);
extern int file_remove_suid(struct file *);
extern void __insert_inode_hash(struct inode *, unsigned long hashval);
-extern void remove_inode_hash(struct inode *);
static inline void insert_inode_hash(struct inode *inode)
{
__insert_inode_hash(inode, inode->i_ino);
}
+
+extern void __remove_inode_hash(struct inode *);
+static inline void remove_inode_hash(struct inode *inode)
+{
+ if (!inode_unhashed(inode))
+ __remove_inode_hash(inode);
+}
+
extern void inode_sb_list_add(struct inode *inode);
#ifdef CONFIG_BLOCK
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH] vfs: avoid call to inode_lru_list_del() if possible
2011-07-27 20:44 ` Christoph Hellwig
2011-07-27 20:59 ` Andi Kleen
2011-07-28 4:41 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
@ 2011-07-28 4:55 ` Eric Dumazet
2 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2011-07-28 4:55 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox,
Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev
Le mercredi 27 juillet 2011 à 16:44 -0400, Christoph Hellwig a écrit :
> On Wed, Jul 27, 2011 at 05:21:05PM +0200, Eric Dumazet wrote:
> > If I am not mistaken, we can add unlocked checks on the three hot spots.
> >
> > After following patch, a close(socket(PF_INET, SOCK_DGRAM, 0)) pair on
> > my dev machine takes ~3us instead of ~9us.
> >
> > Maybe its better to split it in three patches, just let me know.
>
> I think three patches would be a lot cleaner.
>
> As for safety of the unlocked checks:
>
> - inode are either hashed when created or never, so that one looks
> fine.
> - same for the sb list.
> - the writeback list is a bit more dynamic as we move things around
> quite a bit. But in additon to the inode_wb_list_del call from
> evict() it only ever gets remove in writeback_single_inode, which
> for a freeing inode can only be called from the callers of evict().
>
> Btw, I wonder if you should micro-optimize things a bit further by
> moving the unhashed checks from the deletion functions into the callers
> and thus save a function call for each of them.
>
Here is the last patch, addressing inode_lru_list_del() call.
Only the call done from iput_final() can obviously benefit from checking
i_lru being empty or not, so it makes sense to perform the check at
caller site instead of doing it in inode_lru_list_del()
[PATCH] vfs: avoid call to inode_lru_list_del() if possible
inode_lru_list_del() is expensive because of per superblock lru locking,
while some inodes are not in lru list.
Adding a check in iput_final() can speedup pipe/sockets workloads on
SMP.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
fs/inode.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/inode.c b/fs/inode.c
index d0c72ff..b8b8939 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1328,7 +1328,8 @@ static void iput_final(struct inode *inode)
}
inode->i_state |= I_FREEING;
- inode_lru_list_del(inode);
+ if (!list_empty(&inode->i_lru))
+ inode_lru_list_del(inode);
spin_unlock(&inode->i_lock);
evict(inode);
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-07-28 4:55 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20110717010427.GC5359@parisc-linux.org>
[not found] ` <20110717084630.GC8006@one.firstfloor.org>
[not found] ` <1310977028.5756.1.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>
[not found] ` <20110718154008.GA29593@infradead.org>
[not found] ` <20110718155141.GA11013@ZenIV.linux.org.uk>
[not found] ` <1311093158.2707.75.camel@schen9-DESK>
[not found] ` <20110721204042.GB31405@ZenIV.linux.org.uk>
[not found] ` <1311294452.2576.18.camel@schen9-DESK>
[not found] ` <20110723132411.GA22183@infradead.org>
[not found] ` <1311633550.2576.33.camel@schen9-DESK>
[not found] ` <20110725225154.GD22133@ZenIV.linux.org.uk>
[not found] ` <1311636178.2576.34.camel@schen9-DESK>
[not found] ` <1311660013.2996.6.camel@edumazet-laptop>
2011-07-26 8:21 ` [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list Eric Dumazet
2011-07-26 9:03 ` Christoph Hellwig
2011-07-26 9:36 ` Eric Dumazet
2011-07-26 9:42 ` Christoph Hellwig
2011-07-26 10:43 ` Eric Dumazet
2011-07-26 11:49 ` Christoph Hellwig
2011-07-27 15:21 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
2011-07-27 17:12 ` Andi Kleen
2011-07-27 20:44 ` Christoph Hellwig
2011-07-27 20:59 ` Andi Kleen
2011-07-27 21:01 ` Christoph Hellwig
2011-07-28 4:11 ` [PATCH] vfs: conditionally call inode_wb_list_del() Eric Dumazet
2011-07-28 4:41 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
2011-07-28 4:55 ` [PATCH] vfs: avoid call to inode_lru_list_del() if possible Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).