linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Maurer <bmaurer@fb.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Dave Watson <davejwatson@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	"Andy Lutomirski" <luto@amacapital.net>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-api <linux-api@vger.kernel.org>,
	"Paul Turner" <pjt@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Russell King" <linux@arm.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Andrew Hunter <ahh@google.com>,
	Andi Kleen <andi@firstfloor.org>, Chris Lameter <cl@linux.com>,
	rostedt <rostedt@goodmis.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	"Will Deacon" <will.deacon@arm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>
Subject: Re: [RFC PATCH v8 1/9] Restartable sequences system call
Date: Thu, 25 Aug 2016 17:56:25 +0000	[thread overview]
Message-ID: <1BD94294-23AD-40CC-8227-43AE1AD8092F@fb.com> (raw)
In-Reply-To: <545371402.19191.1472144912215.JavaMail.zimbra@efficios.com>

On 8/25/16, 10:08 AM, "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com> wrote:
> The most appealing application we have so far is Dave Watson's Facebook
>    services using jemalloc as a memory allocator. It would be nice if he
>    could re-run those benchmarks with my rseq implementation. The trade-offs
>    here are about speed and memory usage:
 
One piece of context I’d like to provide about our investigation into using rseq for malloc – I think that it’s really important to keep in mind that we’ve optimized the software that we write to survive well in a world where there’s a high memory cost for jemalloc for each thread and jemalloc is unable to have allocation caches as large as we would like. We’re not going to have real world benchmarks that show a magical improvement with rseq because over time we’ve baked the constraints of our environment into the design of our programs and optimized for the current set of APIs the kernel provides. I do think rseq provides a benefit even for applications optimized for today’s malloc implementations. But the real benefit is the new types of application designed that rseq enables and the ability for rseq to provide predictable performance for low-level APIs with much less investment from users. I’ll illustrate the costs that rseq would let us avoid with two examples of design choices we’ve made:

1) Because jemalloc uses a per-thread cache, threads that are sleeping have a high memory cost. For example, if you have a thread-pool with 100 threads but only 10 are used most of the time the other 90 threads will still have a dormant cache consuming memory. In order to combat this we have an abstraction called MemoryIdler (https://github.com/facebook/folly/blob/master/folly/detail/MemoryIdler.h) which is essentially a wrapper around futex that signals jemalloc to release its caches when the thread is idle. From what I can tell this is a practice that isn’t widely adopted even though it can save a substantial amount of memory – rseq makes this a moot point since caches can be per-cpu and the memory allocator does not need to worry about an idle thread hogging the cache.
2) The per-thread nature of malloc implementations has generally led people to avoid thread-per-request designs. Things like MemoryIdler can help you if a thread is going to be idle for seconds before it is used again, but if your thread makes a 100 ms RPC to another server clearing the cache is far too expensive to justify. But you still have a bunch of memory sitting around unused for 100ms. Multiply that by many concurrent requests and you are consuming a lot of memory. This has forced people to handle multiple requests in a single thread – this leads to problems of its own like a contested lock in one request impacting many other requests on the same thread. 

rseq opens up a whole world of algorithms to userspace – algorithms that are O(num CPUs) and where one can have an extremely fast fastpath at the cost of a slower slow path. Many of these algorithms are in use in the kernel today – per-cpu allocators, RCU, light-weight reader writer locks, etc. Even in cases where these APIs can be implemented today, a rseq implementation is often superior in terms of predictability and usability (eg per-thread counters consume more memory and are more expensive to read than per-cpu counters).

Isn’t the large number of uses of rseq-like algorithms in the kernel a pretty substantial sign that there would be demand for similar algorithms by user-space systems programmers?

-b

  reply	other threads:[~2016-08-25 17:58 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-19 20:07 [RFC PATCH v8 0/9] Restartable sequences system call Mathieu Desnoyers
2016-08-19 20:07 ` [RFC PATCH v8 1/9] " Mathieu Desnoyers
2016-08-19 20:23   ` Linus Torvalds
2016-08-19 20:44     ` Josh Triplett
2016-08-19 20:59       ` Linus Torvalds
2016-08-19 20:56     ` Andi Kleen
2016-08-19 21:19       ` Paul E. McKenney
2016-08-19 21:32         ` Linus Torvalds
2016-08-19 23:35           ` Paul E. McKenney
2016-08-19 21:24       ` Josh Triplett
2016-08-19 22:59         ` Dave Watson
2016-08-25 17:08     ` Mathieu Desnoyers
2016-08-25 17:56       ` Ben Maurer [this message]
2016-08-27  4:22         ` Josh Triplett
2016-08-29 15:16           ` Mathieu Desnoyers
2016-08-29 16:10             ` Josh Triplett
2016-08-30  2:01             ` Boqun Feng
2016-08-27 12:21   ` Pavel Machek
2016-08-19 20:07 ` [RFC PATCH v8 2/9] tracing: instrument restartable sequences Mathieu Desnoyers
2016-08-19 20:07 ` [RFC PATCH v8 3/9] Restartable sequences: ARM 32 architecture support Mathieu Desnoyers
2016-08-19 20:07 ` [RFC PATCH v8 4/9] Restartable sequences: wire up ARM 32 system call Mathieu Desnoyers
2016-08-19 20:07 ` [RFC PATCH v8 5/9] Restartable sequences: x86 32/64 architecture support Mathieu Desnoyers
2016-08-19 20:07 ` [RFC PATCH v8 6/9] Restartable sequences: wire up x86 32/64 system call Mathieu Desnoyers
2016-08-19 20:07 ` [RFC PATCH v8 7/9] Restartable sequences: powerpc architecture support Mathieu Desnoyers
2016-08-19 20:07 ` [RFC PATCH v8 8/9] Restartable sequences: Wire up powerpc system call Mathieu Desnoyers
2016-08-19 20:07 ` [RFC PATCH v8 9/9] Restartable sequences: self-tests Mathieu Desnoyers
2016-11-26 23:43 [RFC PATCH v8 1/9] Restartable sequences system call Paul Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1BD94294-23AD-40CC-8227-43AE1AD8092F@fb.com \
    --to=bmaurer@fb.com \
    --cc=ahh@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=boqun.feng@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=davejwatson@fb.com \
    --cc=hpa@zytor.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=luto@amacapital.net \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).