linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Davidlohr Bueso <davidlohr@hp.com>
To: linux-kernel@vger.kernel.org
Cc: mingo@kernel.org, dvhart@linux.intel.com, peterz@infradead.org,
	tglx@linutronix.de, paulmck@linux.vnet.ibm.com, efault@gmx.de,
	jeffm@suse.com, torvalds@linux-foundation.org, jason.low2@hp.com,
	Waiman.Long@hp.com, tom.vaden@hp.com, scott.norton@hp.com,
	aswin@hp.com
Subject: Re: [PATCH v5 0/4] futex: Wakeup optimizations
Date: Sun, 05 Jan 2014 16:59:27 -0800	[thread overview]
Message-ID: <1388969967.4918.3.camel@buesod1.americas.hpqcorp.net> (raw)
In-Reply-To: <1388675120-8017-1-git-send-email-davidlohr@hp.com>

Folks, unless there's a reason not do so, could we get this patchset in
for 3.14? We're already at -rc7 and we could benefit from more testing
in -next, until 3.13 is out.

Thanks,
Davidlohr

On Thu, 2014-01-02 at 07:05 -0800, Davidlohr Bueso wrote:
> Changes from v3/v4 [http://lkml.org/lkml/2013/12/19/627]:
>  - Almost completely redid patch 4, based on suggestions
>    by Linus. Instead of adding an atomic counter to keep
>    track of the plist size, couple the list's head empty
>    call with a check to see if the hb lock is locked.
>    This solves the race that motivated the use of the new
>    atomic field.
> 
>  - Fix grammar in patch 3
> 
>  - Fix SOB tags.
> 
> Changes from v2 [http://lwn.net/Articles/575449/]:
>  - Reordered SOB tags to reflect me as primary author.
> 
>  - Improved ordering guarantee comments for patch 4.
> 
>  - Rebased patch 4 against Linus' tree (this patch didn't
>    apply after the recent futex changes/fixes).
> 
> Changes from v1 [https://lkml.org/lkml/2013/11/22/525]:
>  - Removed patch "futex: Check for pi futex_q only once".
> 
>  - Cleaned up ifdefs for larger hash table.
> 
>  - Added a doc patch from tglx that describes the futex 
>    ordering guarantees.
> 
>  - Improved the lockless plist check for the wake calls.
>    Based on the community feedback, the necessary abstractions
>    and barriers are added to maintain ordering guarantees.
>    Code documentation is also updated.
> 
>  - Removed patch "sched,futex: Provide delayed wakeup list".
>    Based on feedback from PeterZ, I will look into this as
>    a separate issue once the other patches are settled.
> 
> We have been dealing with a customer database workload on large
> 12Tb, 240 core 16 socket NUMA system that exhibits high amounts 
> of contention on some of the locks that serialize internal futex 
> data structures. This workload specially suffers in the wakeup 
> paths, where waiting on the corresponding hb->lock can account for 
> up to ~60% of the time. The result of such calls can mostly be 
> classified as (i) nothing to wake up and (ii) wakeup large amount 
> of tasks.
> 
> Before these patches are applied, we can see this pathological behavior:
> 
>  37.12%  826174  xxx  [kernel.kallsyms] [k] _raw_spin_lock
>             --- _raw_spin_lock
>              |
>              |--97.14%-- futex_wake
>              |          do_futex
>              |          sys_futex
>              |          system_call_fastpath
>              |          |
>              |          |--99.70%-- 0x7f383fbdea1f
>              |          |           yyy
> 
>  43.71%  762296  xxx  [kernel.kallsyms] [k] _raw_spin_lock
>             --- _raw_spin_lock
>              |
>              |--53.74%-- futex_wake
>              |          do_futex
>              |          sys_futex
>              |          system_call_fastpath
>              |          |
>              |          |--99.40%-- 0x7fe7d44a4c05
>              |          |           zzz
>              |--45.90%-- futex_wait_setup
>              |          futex_wait
>              |          do_futex
>              |          sys_futex
>              |          system_call_fastpath
>              |          0x7fe7ba315789
>              |          syscall
> 
> 
> With these patches, contention is practically non existent:
> 
>  0.10%     49   xxx  [kernel.kallsyms]   [k] _raw_spin_lock
>                --- _raw_spin_lock
>                 |
>                 |--76.06%-- futex_wait_setup
>                 |          futex_wait
>                 |          do_futex
>                 |          sys_futex
>                 |          system_call_fastpath
>                 |          |
>                 |          |--99.90%-- 0x7f3165e63789
>                 |          |          syscall|
>                            ...
>                 |--6.27%-- futex_wake
>                 |          do_futex
>                 |          sys_futex
>                 |          system_call_fastpath
>                 |          |
>                 |          |--54.56%-- 0x7f317fff2c05
>                 ...
> 
> Patch 1 is a cleanup.
> 
> Patch 2 addresses the well known issue of the global hash table.
> By creating a larger and NUMA aware table, we can reduce the false
> sharing and collisions, thus reducing the chance of different futexes 
> using hb->lock.
> 
> Patch 3 documents the futex ordering guarantees.
> 
> Patch 4 reduces contention on the corresponding hb->lock by not trying to
> acquire it if there are no blocked tasks in the waitqueue.
> This particularly deals with point (i) above, where we see that it is not
> uncommon for up to 90% of wakeup calls end up returning 0, indicating that no
> tasks were woken.
> 
> This patchset has also been tested on smaller systems for a variety of
> benchmarks, including java workloads, kernel builds and custom bang-the-hell-out-of
> hb locks programs. So far, no functional or performance regressions have been seen.
> Furthermore, no issues were found when running the different tests in the futextest 
> suite: http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/
> 
> This patchset applies on top of Linus' tree as of v3.13-rc6 (9a0bb296)
> 
> Special thanks to Scott Norton, Tom Vanden, Mark Ray and Aswin Chandramouleeswaran
> for help presenting, debugging and analyzing the data.
> 
>   futex: Misc cleanups
>   futex: Larger hash table
>   futex: Document ordering guarantees
>   futex: Avoid taking hb lock if nothing to wakeup
> 
>  kernel/futex.c | 197 ++++++++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 159 insertions(+), 38 deletions(-)
> 



  parent reply	other threads:[~2014-01-06  0:59 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-02 15:05 [PATCH v5 0/4] futex: Wakeup optimizations Davidlohr Bueso
2014-01-02 15:05 ` [PATCH v5 1/4] futex: Misc cleanups Davidlohr Bueso
2014-01-11  6:43   ` Paul E. McKenney
2014-01-02 15:05 ` [PATCH v5 2/4] futex: Larger hash table Davidlohr Bueso
2014-01-11  7:37   ` Paul E. McKenney
2014-01-02 15:05 ` [PATCH v5 3/4] futex: Document ordering guarantees Davidlohr Bueso
2014-01-06 18:58   ` Darren Hart
2014-01-11  7:40   ` Paul E. McKenney
2014-01-02 15:05 ` [PATCH v5 4/4] futex: Avoid taking hb lock if nothing to wakeup Davidlohr Bueso
2014-01-02 19:23   ` Linus Torvalds
2014-01-02 20:59     ` Davidlohr Bueso
2014-01-06 20:56       ` Darren Hart
2014-01-06 20:52   ` Darren Hart
2014-01-07  3:29     ` Davidlohr Bueso
2014-01-07 17:40       ` Darren Hart
2014-01-11  9:49   ` Paul E. McKenney
2014-01-11  9:52     ` Paul E. McKenney
2014-01-11 18:21       ` Davidlohr Bueso
2014-01-06  0:59 ` Davidlohr Bueso [this message]
2014-01-06  1:38 ` [PATCH 5/4] futex: silence uninitialized warnings Davidlohr Bueso
2014-01-06 18:48   ` Darren Hart
2014-01-07  2:55   ` Linus Torvalds
2014-01-07  3:02     ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1388969967.4918.3.camel@buesod1.americas.hpqcorp.net \
    --to=davidlohr@hp.com \
    --cc=Waiman.Long@hp.com \
    --cc=aswin@hp.com \
    --cc=dvhart@linux.intel.com \
    --cc=efault@gmx.de \
    --cc=jason.low2@hp.com \
    --cc=jeffm@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hp.com \
    --cc=tglx@linutronix.de \
    --cc=tom.vaden@hp.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).