linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.10-mm3 scaling problem with inotify
@ 2005-01-13 23:56 John Hawkes
  2005-01-14  0:49 ` Robert Love
  0 siblings, 1 reply; 11+ messages in thread
From: John Hawkes @ 2005-01-13 23:56 UTC (permalink / raw)
  To: linux-kernel, rml

I believe I've encountered a scaling problem with the inotify code in
2.6.10-mm3.

vfs_read() and vfs_write() (for example) do:
    dnotify_parent(dentry, DN_ACCESS);
    inotify_dentry_parent_queue_event(dentry,
                           IN_ACCESS, 0, dentry->d_name.name);
    inotify_inode_queue_event(inode, IN_ACCESS, 0, NULL);
for the optional "notify" functionality.

dnotify_parent() knows how to exit quickly:
       if (!dir_notify_enable)
                return;
looking at a global flag, and inotify_inode_queue_event() also knows a quick
exit:
        if (!inode->inotify_data)
                return;
However, inotify_dentry_parent_queue_event() first has to get the parent 
dentry:
        parent = dget_parent(dentry);
        inotify_inode_queue_event(parent->d_inode, mask, cookie, filename);
        dput(parent);
and, sadly, dget_parent(dentry) grabs a spinlock (dentry->d_lock) and dirties a cacheline (dentry->dcount).  I have an AIM7-like workload that does numerous
concurrent reads and suffers a 40% drop in performance because of this,
primarily due to spinlock contention.

Is it possible for a parent's inode->inotify_data to be enabled when none of 
its children's inotify_data are enabled?  That would make it easy - just look 
at the current inode's inotify_data before walking back to the parent.

John Hawkes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.10-mm3 scaling problem with inotify
  2005-01-13 23:56 2.6.10-mm3 scaling problem with inotify John Hawkes
@ 2005-01-14  0:49 ` Robert Love
  2005-01-14  1:31   ` John McCutchan
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Love @ 2005-01-14  0:49 UTC (permalink / raw)
  To: John Hawkes; +Cc: linux-kernel, John McCutchan

On Thu, 2005-01-13 at 15:56 -0800, John Hawkes wrote:

> I believe I've encountered a scaling problem with the inotify code in
> 2.6.10-mm3.
> 
> vfs_read() and vfs_write() (for example) do:
>     dnotify_parent(dentry, DN_ACCESS);
>     inotify_dentry_parent_queue_event(dentry,
>                            IN_ACCESS, 0, dentry->d_name.name);
>     inotify_inode_queue_event(inode, IN_ACCESS, 0, NULL);
> for the optional "notify" functionality.
> 
> dnotify_parent() knows how to exit quickly:
>        if (!dir_notify_enable)
>                 return;

This isn't a "quick exit", though.  It is just a termination check in
case dnotify was disabled on boot.  The rest of dnotify_parent() is
always executed and it does the equivalent of dget_parent().

So why is inotify showing up on your test and not dnotify?

Hm, dnotify always grabs the lock but does not bump dentry->count unless
there is actually a watch on the dentry.  Could that really be the
difference and cause of the slowdown?  We could probably do that, too.

> Is it possible for a parent's inode->inotify_data to be enabled when none of 
> its children's inotify_data are enabled?  That would make it easy - just look 
> at the current inode's inotify_data before walking back to the parent.

Unfortunately, no.  There is no relationship between the parent and the
child inode's inotify_data structure.  The best we can do is exactly
what dnotify does, actually, which is

	spin_lock(&dentry->d_lock);
	parent = dentry->d_parent;
	if (parent->d_inode->i_dnotify_mask & event) {
		dget(parent);
		spin_unlock(&dentry->d_lock);
		__inode_dir_notify(parent->d_inode, event);
		dput(parent);
	} else {
		spin_unlock(&dentry->d_lock);
	}

Instead of our current "explicit"

	parent = dget_parent(dentry);
	inotify_inode_queue_event(parent->d_inode, mask, cookie,
				  filename);
	dput(parent);

E.g., save the ref unless absolutely needed.

I am open to other ideas, too, but I don't see any nice shortcuts like
what we can do in inotify_inode_queue_event().

(Other) John?  Any ideas?

Best,

	Robert Love



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.10-mm3 scaling problem with inotify
  2005-01-14  0:49 ` Robert Love
@ 2005-01-14  1:31   ` John McCutchan
  2005-01-14  2:29     ` Robert Love
  0 siblings, 1 reply; 11+ messages in thread
From: John McCutchan @ 2005-01-14  1:31 UTC (permalink / raw)
  To: Robert Love; +Cc: John Hawkes, linux-kernel

On Thu, 2005-01-13 at 19:49 -0500, Robert Love wrote:
> On Thu, 2005-01-13 at 15:56 -0800, John Hawkes wrote:
> [snip]
> 
> I am open to other ideas, too, but I don't see any nice shortcuts like
> what we can do in inotify_inode_queue_event().
> 
> (Other) John?  Any ideas?

No, you covered things well. This code was really just a straight copy
of the dnotify code. Rob cleaned it up at some point, giving us what we
have today. The only fix I can think of is the one suggested by Rob --
copying the dnotify code again.

-- 
John McCutchan <ttb@tentacle.dhs.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.10-mm3 scaling problem with inotify
  2005-01-14  1:31   ` John McCutchan
@ 2005-01-14  2:29     ` Robert Love
  2005-01-14 18:05       ` John Hawkes
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Love @ 2005-01-14  2:29 UTC (permalink / raw)
  To: John McCutchan, John Hawkes; +Cc: linux-kernel

On Thu, 2005-01-13 at 20:31 -0500, John McCutchan wrote:

> No, you covered things well. This code was really just a straight copy
> of the dnotify code. Rob cleaned it up at some point, giving us what we
> have today. The only fix I can think of is the one suggested by Rob --
> copying the dnotify code again.

Oh, did I undo the dnotify optimization? :-)

Eh, no, looks like not exactly: The old code always did a dget().

John Hawkes: Attached patch implements the dget() optimization in the
same manner as dnotify.  Compile-tested but not booted.

Let me know!

	Robert Love


inotify: don't do dget() unless we have to

 drivers/char/inotify.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff -urN linux-2.6.10-mm3/drivers/char/inotify.c linux/drivers/char/inotify.c
--- linux-2.6.10-mm3/drivers/char/inotify.c	2005-01-13 21:04:53.108971856 -0500
+++ linux/drivers/char/inotify.c	2005-01-13 21:25:20.052448096 -0500
@@ -607,10 +607,18 @@
 				       u32 cookie, const char *filename)
 {
 	struct dentry *parent;
+	struct inode *inode;
 
-	parent = dget_parent(dentry);
-	inotify_inode_queue_event(parent->d_inode, mask, cookie, filename);
-	dput(parent);
+	spin_lock(&dentry->d_lock);
+	parent = dentry->d_parent;
+	inode = parent->d_inode;
+	if (inode->inotify_data) {
+		dget(parent);
+		spin_unlock(&dentry->d_lock);
+		inotify_inode_queue_event(inode, mask, cookie, filename);
+		dput(parent);
+	} else
+		spin_unlock(&dentry->d_lock);
 }
 EXPORT_SYMBOL_GPL(inotify_dentry_parent_queue_event);
 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.10-mm3 scaling problem with inotify
  2005-01-14  2:29     ` Robert Love
@ 2005-01-14 18:05       ` John Hawkes
  2005-01-14 18:15         ` Robert Love
  0 siblings, 1 reply; 11+ messages in thread
From: John Hawkes @ 2005-01-14 18:05 UTC (permalink / raw)
  To: Robert Love, John McCutchan, John Hawkes; +Cc: linux-kernel

From: "Robert Love" <rml@novell.com>
...
> John Hawkes: Attached patch implements the dget() optimization in the
> same manner as dnotify.  Compile-tested but not booted.
>
> Let me know!
...
> inotify: don't do dget() unless we have to
>
>  drivers/char/inotify.c |   14 +++++++++++---
>  1 files changed, 11 insertions(+), 3 deletions(-)

The patch shows a substantial 4x improvement for my specific workload (@64p),
although I can still envision workloads that will stimulate high spinlock
contention from spin_lock(&dentry->d_lock).  First things first -- let's get
this fix into the -mm tree.  Thanks!

John Hawkes


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.10-mm3 scaling problem with inotify
  2005-01-14 18:05       ` John Hawkes
@ 2005-01-14 18:15         ` Robert Love
  2005-01-14 18:39           ` John Hawkes
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Love @ 2005-01-14 18:15 UTC (permalink / raw)
  To: John Hawkes; +Cc: John McCutchan, John Hawkes, linux-kernel

On Fri, 2005-01-14 at 10:05 -0800, John Hawkes wrote:

> The patch shows a substantial 4x improvement for my specific workload (@64p),
> although I can still envision workloads that will stimulate high spinlock
> contention from spin_lock(&dentry->d_lock).  First things first -- let's get
> this fix into the -mm tree.  Thanks!

Glad it worked!

Does the spin_lock() show up in dnotify's code path?  It is getting
called, unconditionally, there, too.

	Robert Love



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.10-mm3 scaling problem with inotify
  2005-01-14 18:15         ` Robert Love
@ 2005-01-14 18:39           ` John Hawkes
  0 siblings, 0 replies; 11+ messages in thread
From: John Hawkes @ 2005-01-14 18:39 UTC (permalink / raw)
  To: Robert Love; +Cc: John McCutchan, John Hawkes, linux-kernel

From: "Robert Love" <rml@novell.com>
> On Fri, 2005-01-14 at 10:05 -0800, John Hawkes wrote:
>
> > The patch shows a substantial 4x improvement for my specific workload
(@64p),
> > although I can still envision workloads that will stimulate high spinlock
> > contention from spin_lock(&dentry->d_lock).  First things first -- let's
get
> > this fix into the -mm tree.  Thanks!
>
> Glad it worked!
>
> Does the spin_lock() show up in dnotify's code path?  It is getting
> called, unconditionally, there, too.

No, it doesn't show up (with this workload).
Lock contention and cacheline contention can be complex.  Contention in one
area can alter the timing of other areas and reduce contention.  Anyway, with
your inotify_dentry_parent_queue_event() patch, the PC sampling for my
workload is now (with the lock waits getting rolled up to the callers) for
everything above 0.1% is:

 12:38pm  Samples: 1887402, Mode: pc,  load average: 1832.89, 1722.88, 1264.14

     COUNT (  % )     ADDR NAME
    843466 (44.7) a000000100147ba0 vma_adjust
    521940 (27.7) a0000001001131a0 USER
    278285 (14.7) a000000100150d20 anon_vma_unlink
     26952 ( 1.4) a000000100150ba0 anon_vma_link
     26099 ( 1.4) a00000010013c1e0 clear_page_range
     16638 ( 0.9) a0000001007473e0 established_get_next
     12378 ( 0.7) a00000010044b200 _mcount
     12150 ( 0.6) a000000100039d20 ia64_brk
     10237 ( 0.5) a00000010013dba0 zap_pte_range
      9886 ( 0.5) a00000010044aba0 copy_page
      9311 ( 0.5) a00000010013cfe0 copy_pte_range
      8239 ( 0.4) a00000010014bcc0 do_munmap
      5014 ( 0.3) a0000001001c1340 update_atime
      4743 ( 0.3) a00000010044b300 __copy_user
      4462 ( 0.2) a0000001004dc360 inotify_dentry_parent_queue_event
      4439 ( 0.2) a000000100115020 find_get_page
      3602 ( 0.2) a0000001000b0e40 copy_mm
      3581 ( 0.2) a0000001000d3d80 del_timer_sync
      2951 ( 0.2) a0000001001720e0 vfs_read

John Hawkes


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: 2.6.10-mm3 scaling problem with inotify
  2005-01-14  2:31 Zou, Nanhai
  2005-01-14  2:36 ` John McCutchan
@ 2005-01-14  2:41 ` Robert Love
  1 sibling, 0 replies; 11+ messages in thread
From: Robert Love @ 2005-01-14  2:41 UTC (permalink / raw)
  To: Zou, Nanhai; +Cc: John McCutchan, linux-kernel, hawkes

On Fri, 2005-01-14 at 10:31 +0800, Zou, Nanhai wrote:

(your email wraps badly)

>  There is still a little difference between your implement in
>  inotify_dentry_parent_queue_event from dnotify_parent
>  
>  In dnotify_parent, if parent is not watching the event, the code will
> not fall
>  through dget and dput path.
>  
>  While in inotify_dentry_parent_queue_event kernel will go dget and dput
> even
>  if (inode->inotify_data == NULL).
>  
>  While dget and dput will introduce a lot of atomic operations..
>  And the most important, dput will grab global dcache_lock...,
>  I think that is the reason why John Hawkes saw great performance drop.
>  
>  Simply follow dnotify_parent, only call dget and dput when
> inode->inotify_data != NULL will solve this problem I think.

This is what I meant in the original email.  This is the dnotify
difference I was talking about.

The patch I submitted should put the two in parity.

	Robert Love



^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: 2.6.10-mm3 scaling problem with inotify
  2005-01-14  2:31 Zou, Nanhai
@ 2005-01-14  2:36 ` John McCutchan
  2005-01-14  2:41 ` Robert Love
  1 sibling, 0 replies; 11+ messages in thread
From: John McCutchan @ 2005-01-14  2:36 UTC (permalink / raw)
  To: Zou, Nanhai; +Cc: linux-kernel, Robert Love, hawkes

On Fri, 2005-01-14 at 10:31 +0800, Zou, Nanhai wrote:
>  
>  There is still a little difference between your implement in
>  inotify_dentry_parent_queue_event from dnotify_parent
>  
>  In dnotify_parent, if parent is not watching the event, the code will
> not fall
>  through dget and dput path.
>  
>  While in inotify_dentry_parent_queue_event kernel will go dget and dput
> even
>  if (inode->inotify_data == NULL).
>  
>  While dget and dput will introduce a lot of atomic operations..
>  And the most important, dput will grab global dcache_lock...,
>  I think that is the reason why John Hawkes saw great performance drop.
>  
>  Simply follow dnotify_parent, only call dget and dput when
> inode->inotify_data != NULL will solve this problem I think.
> 

Yeah, the old code was written before inode->inotify_data existed.
Robert caught this before he sent his patch. His patch should fix this
regression.

-- 
John McCutchan <ttb@tentacle.dhs.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: 2.6.10-mm3 scaling problem with inotify
@ 2005-01-14  2:31 Zou, Nanhai
  2005-01-14  2:36 ` John McCutchan
  2005-01-14  2:41 ` Robert Love
  0 siblings, 2 replies; 11+ messages in thread
From: Zou, Nanhai @ 2005-01-14  2:31 UTC (permalink / raw)
  To: John McCutchan; +Cc: linux-kernel, rml, hawkes

> -----Original Message-----
> From: Zou, Nanhai
> Sent: Friday, January 14, 2005 10:28 AM
> To: 'John McCutchan'
> Subject: RE: 2.6.10-mm3 scaling problem with inotify
> 
> > From: linux-kernel-owner@vger.kernel.org
> > [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of John
McCutchan
> > Sent: Friday, January 14, 2005 9:31 AM
> > To: Robert Love
> > Cc: John Hawkes; linux-kernel@vger.kernel.org
> > Subject: Re: 2.6.10-mm3 scaling problem with inotify
> >
> > On Thu, 2005-01-13 at 19:49 -0500, Robert Love wrote:
> > > On Thu, 2005-01-13 at 15:56 -0800, John Hawkes wrote:
> > > [snip]
> > >
> > > I am open to other ideas, too, but I don't see any nice shortcuts
like
> > > what we can do in inotify_inode_queue_event().
> > >
> > > (Other) John?  Any ideas?
> >
> > No, you covered things well. This code was really just a straight
copy
> > of the dnotify code. Rob cleaned it up at some point, giving us what
we
> > have today. The only fix I can think of is the one suggested by Rob
--
> > copying the dnotify code again.

 
 There is still a little difference between your implement in
 inotify_dentry_parent_queue_event from dnotify_parent
 
 In dnotify_parent, if parent is not watching the event, the code will
not fall
 through dget and dput path.
 
 While in inotify_dentry_parent_queue_event kernel will go dget and dput
even
 if (inode->inotify_data == NULL).
 
 While dget and dput will introduce a lot of atomic operations..
 And the most important, dput will grab global dcache_lock...,
 I think that is the reason why John Hawkes saw great performance drop.
 
 Simply follow dnotify_parent, only call dget and dput when
inode->inotify_data != NULL will solve this problem I think.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* 2.6.10-mm3 scaling problem with inotify
@ 2005-01-14  0:51 John Hawkes
  0 siblings, 0 replies; 11+ messages in thread
From: John Hawkes @ 2005-01-14  0:51 UTC (permalink / raw)
  To: linux-kernel, rml

I believe I've encountered a scaling problem with the inotify code in
2.6.10-mm3.

vfs_read() and vfs_write() (for example) do:
    dnotify_parent(dentry, DN_ACCESS);
    inotify_dentry_parent_queue_event(dentry,
                           IN_ACCESS, 0, dentry->d_name.name);
    inotify_inode_queue_event(inode, IN_ACCESS, 0, NULL);
for the optional "notify" functionality.

dnotify_parent() knows how to exit quickly:
       if (!dir_notify_enable)
                return;
looking at a global flag, and inotify_inode_queue_event() also knows a quick
exit:
        if (!inode->inotify_data)
                return;
However, inotify_dentry_parent_queue_event() first has to get the parent 
dentry:
        parent = dget_parent(dentry);
        inotify_inode_queue_event(parent->d_inode, mask, cookie, filename);
        dput(parent);
and, sadly, dget_parent(dentry) grabs a spinlock (dentry->d_lock) and dirties a cacheline (dentry->dcount).  I have an AIM7-like workload that does numerous
concurrent reads and suffers a 40% drop in performance because of this.

Is it possible for a parent's inode->inotify_data to be enabled when none of 
its children's inotify_data are enabled?  That would make it easy - just look 
at the current inode's inotify_data before walking back to the parent.

John Hawkes

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-01-14 18:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-13 23:56 2.6.10-mm3 scaling problem with inotify John Hawkes
2005-01-14  0:49 ` Robert Love
2005-01-14  1:31   ` John McCutchan
2005-01-14  2:29     ` Robert Love
2005-01-14 18:05       ` John Hawkes
2005-01-14 18:15         ` Robert Love
2005-01-14 18:39           ` John Hawkes
2005-01-14  0:51 John Hawkes
2005-01-14  2:31 Zou, Nanhai
2005-01-14  2:36 ` John McCutchan
2005-01-14  2:41 ` Robert Love

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).