All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <waiman.long-VXdhtT5mjnY@public.gmane.org>
To: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
Cc: Alexander Viro
	<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Miklos Szeredi <mszeredi-AlSwsSmVLrQ@public.gmane.org>,
	Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>,
	Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org>,
	Steve French <sfrench-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>,
	Trond Myklebust
	<Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>,
	Eric Paris <eparis-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	autofs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	"Chandramouleeswaran, Aswin" <aswin-VXdhtT5mjnY@public.gmane.org>,
	"Norton, Scott J" <scott.norton-VXdhtT5mjnY@public.gmane.org>,
	Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
Subject: Re: [PATCH 0/3 v3] dcache: make it more scalable on large system
Date: Thu, 23 May 2013 17:34:23 -0400	[thread overview]
Message-ID: <519E8B5F.3080905@hp.com> (raw)
In-Reply-To: <20130523094201.GA24543@dastard>

On 05/23/2013 05:42 AM, Dave Chinner wrote:
> On Wed, May 22, 2013 at 09:37:25PM -0400, Waiman Long wrote:
>> Change log:
>>
>> v2->v3
>>   - Fix the RCU lock problem found by Al Viro.
>>   - Rebase the code to the latest v3.10-rc1 linux mainline.
>>   - Remove patch 4 which may be problematic if the dentry is deleted.
>>   - Rerun performance measurement using 3.10-rc1 kernel.
>>
>> v1->v2
>>   - Include performance improvement in the AIM7 benchmark results because
>>     of this patch.
>>   - Modify dget_parent() to avoid taking the lock, if possible, to further
>>     improve AIM7 benchmark results.
>>
>> During some perf-record sessions of the kernel running the high_systime
>> workload of the AIM7 benchmark, it was found that quite a large portion
>> of the spinlock contention was due to the perf_event_mmap_event()
>> function itself. This perf kernel function calls d_path() which,
>> in turn, call path_get() and dput() indirectly. These 3 functions
>> were the hottest functions shown in the perf-report output of
>> the _raw_spin_lock() function in an 8-socket system with 80 cores
>> (hyperthreading off) with a 3.10-rc1 kernel:
> What was it I said about this patchset when you posted it to speed
> up an Oracle benchmark back in february? I'll repeat:
>
> "Nobody should be doing reverse dentry-to-name lookups in a quantity
> sufficient for it to become a performance limiting factor."

Thank for the comment, but my point is that it is the d_lock contention 
is skewing the data about how much spin lock contention had actually 
happened in the workload and it makes it harder to pinpoint problem 
areas to look at. This is not about performance, it is about accurate 
representation of performance data. Ideally, we want the overhead of 
turning on perf instrumentation to be as low as possible.


>
> And that makes whatever that tracepoint is doing utterly stupid.
> Dumping out full paths in a tracepoint is hardly "low overhead", and
> that tracepoint should be stomped on from a great height. Sure,
> print the filename straight out of the dentry into a tracepoint,
> but repeated calculating the full path (probably for the same set of
> dentries) is just a dumb thing to do.
>
> Anyway, your numbers show that a single AIM7 microbenchmark goes
> better under tracing the specific mmap event that uses d_path(), but
> the others are on average a little bit slower. That doesn't convince
> me that this is worth doing. Indeed, what happens to performance
> when you aren't tracing?
>
> Indeed, have you analysed what makes that
> microbenchmark contend so much on the dentry lock while reverse
> lookups are occuring? Dentry lock contention does not necessarily
> indicate a problem with the dentry locks, and without knowing why
> there is contention occuring (esp. compared to the other benchmarks)
> we can't really determine if this is a good solution or not...

What made it contend so much was the large number of CPUs available in 
my test system which is a 8-socket Westmere EX machines with 80 cores. 
As perf was collecting data from every core, the threads will 
unavoidably bump into each other to translate dentries back to the full 
paths. The current code only allows one CPU in the d_path() critical 
path. My patch will allow all of them to be in the critical path 
concurrently.

The upcoming Ivy-Bridge EX can have up to 15 cores per socket. So even a 
4-socket machine will have up to 60 cores or 120 virtual CPUs if 
hyperthreading is turned on.
> IOWs, you need more than one microbenchmark that interacts with
> some naive monitoring code to justify the complexity these locking
> changes introduce....
The first patch can also independently improve the performance of a 
number of AIM7 workloads including almost 7X improvement in the short 
workload. More detailed information of these types of performance 
benefit was discussed in the patch description of the first patch. I 
will try to collect more performance improvement data on other workloads 
too.

Thank for the review.
Longman

WARNING: multiple messages have this Message-ID (diff)
From: Waiman Long <waiman.long@hp.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Jeff Layton <jlayton@redhat.com>,
	Miklos Szeredi <mszeredi@suse.cz>, Ian Kent <raven@themaw.net>,
	Sage Weil <sage@inktank.com>, Steve French <sfrench@samba.org>,
	Trond Myklebust <Trond.Myklebust@netapp.com>,
	Eric Paris <eparis@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	autofs@vger.kernel.org, ceph-devel@vger.kernel.org,
	linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org,
	"Chandramouleeswaran, Aswin" <aswin@hp.com>,
	"Norton, Scott J" <scott.norton@hp.com>,
	Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCH 0/3 v3] dcache: make it more scalable on large system
Date: Thu, 23 May 2013 17:34:23 -0400	[thread overview]
Message-ID: <519E8B5F.3080905@hp.com> (raw)
In-Reply-To: <20130523094201.GA24543@dastard>

On 05/23/2013 05:42 AM, Dave Chinner wrote:
> On Wed, May 22, 2013 at 09:37:25PM -0400, Waiman Long wrote:
>> Change log:
>>
>> v2->v3
>>   - Fix the RCU lock problem found by Al Viro.
>>   - Rebase the code to the latest v3.10-rc1 linux mainline.
>>   - Remove patch 4 which may be problematic if the dentry is deleted.
>>   - Rerun performance measurement using 3.10-rc1 kernel.
>>
>> v1->v2
>>   - Include performance improvement in the AIM7 benchmark results because
>>     of this patch.
>>   - Modify dget_parent() to avoid taking the lock, if possible, to further
>>     improve AIM7 benchmark results.
>>
>> During some perf-record sessions of the kernel running the high_systime
>> workload of the AIM7 benchmark, it was found that quite a large portion
>> of the spinlock contention was due to the perf_event_mmap_event()
>> function itself. This perf kernel function calls d_path() which,
>> in turn, call path_get() and dput() indirectly. These 3 functions
>> were the hottest functions shown in the perf-report output of
>> the _raw_spin_lock() function in an 8-socket system with 80 cores
>> (hyperthreading off) with a 3.10-rc1 kernel:
> What was it I said about this patchset when you posted it to speed
> up an Oracle benchmark back in february? I'll repeat:
>
> "Nobody should be doing reverse dentry-to-name lookups in a quantity
> sufficient for it to become a performance limiting factor."

Thank for the comment, but my point is that it is the d_lock contention 
is skewing the data about how much spin lock contention had actually 
happened in the workload and it makes it harder to pinpoint problem 
areas to look at. This is not about performance, it is about accurate 
representation of performance data. Ideally, we want the overhead of 
turning on perf instrumentation to be as low as possible.


>
> And that makes whatever that tracepoint is doing utterly stupid.
> Dumping out full paths in a tracepoint is hardly "low overhead", and
> that tracepoint should be stomped on from a great height. Sure,
> print the filename straight out of the dentry into a tracepoint,
> but repeated calculating the full path (probably for the same set of
> dentries) is just a dumb thing to do.
>
> Anyway, your numbers show that a single AIM7 microbenchmark goes
> better under tracing the specific mmap event that uses d_path(), but
> the others are on average a little bit slower. That doesn't convince
> me that this is worth doing. Indeed, what happens to performance
> when you aren't tracing?
>
> Indeed, have you analysed what makes that
> microbenchmark contend so much on the dentry lock while reverse
> lookups are occuring? Dentry lock contention does not necessarily
> indicate a problem with the dentry locks, and without knowing why
> there is contention occuring (esp. compared to the other benchmarks)
> we can't really determine if this is a good solution or not...

What made it contend so much was the large number of CPUs available in 
my test system which is a 8-socket Westmere EX machines with 80 cores. 
As perf was collecting data from every core, the threads will 
unavoidably bump into each other to translate dentries back to the full 
paths. The current code only allows one CPU in the d_path() critical 
path. My patch will allow all of them to be in the critical path 
concurrently.

The upcoming Ivy-Bridge EX can have up to 15 cores per socket. So even a 
4-socket machine will have up to 60 cores or 120 virtual CPUs if 
hyperthreading is turned on.
> IOWs, you need more than one microbenchmark that interacts with
> some naive monitoring code to justify the complexity these locking
> changes introduce....
The first patch can also independently improve the performance of a 
number of AIM7 workloads including almost 7X improvement in the short 
workload. More detailed information of these types of performance 
benefit was discussed in the patch description of the first patch. I 
will try to collect more performance improvement data on other workloads 
too.

Thank for the review.
Longman

  reply	other threads:[~2013-05-23 21:34 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-23  1:37 [PATCH 0/3 v3] dcache: make it more scalable on large system Waiman Long
2013-05-23  1:37 ` [PATCH 1/3 v3] dcache: Don't take unnecessary lock in d_count update Waiman Long
2013-05-23  1:37   ` Waiman Long
2013-05-23  1:37 ` Waiman Long
2013-05-23  1:37 ` [PATCH 2/3 v3] dcache: introduce a new sequence read/write lock type Waiman Long
2013-05-23  1:37   ` Waiman Long
2013-05-23  1:37 ` Waiman Long
2013-05-23  1:37 ` [PATCH 3/3 v3] dcache: change rename_lock to a sequence read/write lock Waiman Long
2013-05-23  1:37 ` Waiman Long
2013-05-23  1:37   ` Waiman Long
2013-05-23  9:42 ` [PATCH 0/3 v3] dcache: make it more scalable on large system Dave Chinner
2013-05-23 21:34   ` Waiman Long [this message]
2013-05-23 21:34     ` Waiman Long
2013-05-27  2:09     ` Dave Chinner
2013-05-29 15:55       ` Waiman Long
     [not found]         ` <51A624E2.3000301-VXdhtT5mjnY@public.gmane.org>
2013-05-29 16:13           ` Andi Kleen
2013-05-29 16:13             ` Andi Kleen
     [not found]             ` <20130529161358.GJ6123-1g7Xle2YJi4/4alezvVtWx2eb7JE58TQ@public.gmane.org>
2013-05-29 20:23               ` Waiman Long
2013-05-29 20:23                 ` Waiman Long
2013-05-29 16:18           ` Simo Sorce
2013-05-29 16:18             ` Simo Sorce
2013-05-29 16:56             ` Andi Kleen
2013-05-29 17:03               ` Simo Sorce
2013-05-29 20:37               ` Waiman Long
     [not found]             ` <1369844289.2769.146.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
2013-05-29 20:32               ` Waiman Long
2013-05-29 20:32                 ` Waiman Long
2013-05-29 18:46           ` J. Bruce Fields
2013-05-29 18:46             ` J. Bruce Fields
2013-05-29 20:37             ` Andi Kleen
     [not found]               ` <20130529203700.GM6123-1g7Xle2YJi4/4alezvVtWx2eb7JE58TQ@public.gmane.org>
2013-05-29 20:43                 ` J. Bruce Fields
2013-05-29 20:43                   ` J. Bruce Fields
2013-05-29 21:01                   ` Andi Kleen
2013-05-29 21:19               ` Jörn Engel
2013-05-29 21:19                 ` Jörn Engel
2013-05-30 15:48                 ` Waiman Long
2013-05-30 15:48                   ` Waiman Long
2013-05-30 15:11                   ` Jörn Engel
2013-05-30 15:11                     ` Jörn Engel
2013-06-06  3:48               ` Dave Chinner
2013-05-29 20:40             ` Waiman Long
2013-05-23  1:37 Waiman Long
2013-05-23  1:37 ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=519E8B5F.3080905@hp.com \
    --to=waiman.long-vxdhtt5mjny@public.gmane.org \
    --cc=Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
    --cc=andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org \
    --cc=aswin-VXdhtT5mjnY@public.gmane.org \
    --cc=autofs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org \
    --cc=eparis-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mszeredi-AlSwsSmVLrQ@public.gmane.org \
    --cc=raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org \
    --cc=sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org \
    --cc=scott.norton-VXdhtT5mjnY@public.gmane.org \
    --cc=sfrench-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org \
    --cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.