All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Ben Maurer <bmaurer@fb.com>
Cc: Josh Triplett <josh@joshtriplett.org>,
	Shane M Seymour <shane.seymour@hpe.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Paul Turner <pjt@google.com>, Andrew Hunter <ahh@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org,
	linux-api <linux-api@vger.kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Andi Kleen <andi@firstfloor.org>,
	Dave Watson <davejwatson@fb.com>, Chris Lameter <cl@linux.com>,
	Ingo Molnar <mingo@redhat.com>, rostedt <rostedt@goodmis.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Russell King <linux@arm.linux.org.uk>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>
Subject: Re: [RFC PATCH 0/3] Implement getcpu_cache system call
Date: Wed, 13 Jan 2016 00:22:29 +0000 (UTC)	[thread overview]
Message-ID: <484967406.344576.1452644549992.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <5CDDBDF2D36D9F43B9F5E99003F6A0D49A1426A5@PRN-MBX02-1.TheFacebook.com>

----- On Jan 12, 2016, at 4:02 PM, Ben Maurer bmaurer@fb.com wrote:

>> One idea I have would be to let the kernel reserve some space either after the
>> first stack address (for a stack growing down) or at the beginning of the
>> allocated TLS area for each thread in copy_thread_tls() by fiddling with
>> sp or the tls base address when creating a thread.
> 
> Could this be implemented by having glibc use a well known symbol name to define
> the per-thread TLS area? If an high performance application wants to avoid any
> relocations in accessing this variable it would define it and that definition
> would override glibc's. This is how things work with malloc. glibc has a
> default malloc implementation but we link jemalloc directly into our binaries.
> in addition to changing the malloc implementation this means that calls to
> malloc don't go through the PLT.

Just to make sure I understand your proposal: defining a well known symbol
with a weak attribute in glibc (or bionic...), e.g.:

int32_t __thread __attribute__((weak)) __getcpu_cache;

so that applications which care about bypassing the PLT can override it with:

int32_t __thread __getcpu_cache;

glibc/bionic would be responsible for calling the getcpu_cache() system call
to register/unregister this TLS variable for each thread.

One thing I would like to figure out is whether we can use this in a way that
would allow introducing getcpu_cache() into applications and libraries
(e.g. lttng-ust tracer) before it gets implemented into glibc, in a way that
would keep forward compatibility for whenever it gets introduced in glibc.

We can declare __getcpu_cache as a weak symbol in arbitrary libraries, and
make them register/unregister the cache through the getcpu_cache syscall.
The main thing that I would need to tweak at the kernel level within the
system call would be to keep a refcount of the number of times the
__getcpu_cache is registered per thread. This would allow multiple registrations,
one per library (e.g. lttng-ust) and one for glibc, but we would validate
that they all register the exact same address for a given thread.

The reference counting trick should also work for cases where applications
define a non-weak __getcpu_cache, and want to call the getcpu_cache
system call to register it themselves (before glibc adds support for it).

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

WARNING: multiple messages have this Message-ID (diff)
From: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
To: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
Cc: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>,
	Shane M Seymour <shane.seymour-ZPxbGqLxI0U@public.gmane.org>,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-api <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>,
	Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>,
	Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>,
	Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>,
	Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>,
	"Paul E. McKenney"
	<paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>,
	Linus Torvalds
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>,
	Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>,
	Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>,
	Michael Kerrisk
	<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: [RFC PATCH 0/3] Implement getcpu_cache system call
Date: Wed, 13 Jan 2016 00:22:29 +0000 (UTC)	[thread overview]
Message-ID: <484967406.344576.1452644549992.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <5CDDBDF2D36D9F43B9F5E99003F6A0D49A1426A5-f8hGUhss0nh9TZdEUguypQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>

----- On Jan 12, 2016, at 4:02 PM, Ben Maurer bmaurer-b10kYP2dOMg@public.gmane.org wrote:

>> One idea I have would be to let the kernel reserve some space either after the
>> first stack address (for a stack growing down) or at the beginning of the
>> allocated TLS area for each thread in copy_thread_tls() by fiddling with
>> sp or the tls base address when creating a thread.
> 
> Could this be implemented by having glibc use a well known symbol name to define
> the per-thread TLS area? If an high performance application wants to avoid any
> relocations in accessing this variable it would define it and that definition
> would override glibc's. This is how things work with malloc. glibc has a
> default malloc implementation but we link jemalloc directly into our binaries.
> in addition to changing the malloc implementation this means that calls to
> malloc don't go through the PLT.

Just to make sure I understand your proposal: defining a well known symbol
with a weak attribute in glibc (or bionic...), e.g.:

int32_t __thread __attribute__((weak)) __getcpu_cache;

so that applications which care about bypassing the PLT can override it with:

int32_t __thread __getcpu_cache;

glibc/bionic would be responsible for calling the getcpu_cache() system call
to register/unregister this TLS variable for each thread.

One thing I would like to figure out is whether we can use this in a way that
would allow introducing getcpu_cache() into applications and libraries
(e.g. lttng-ust tracer) before it gets implemented into glibc, in a way that
would keep forward compatibility for whenever it gets introduced in glibc.

We can declare __getcpu_cache as a weak symbol in arbitrary libraries, and
make them register/unregister the cache through the getcpu_cache syscall.
The main thing that I would need to tweak at the kernel level within the
system call would be to keep a refcount of the number of times the
__getcpu_cache is registered per thread. This would allow multiple registrations,
one per library (e.g. lttng-ust) and one for glibc, but we would validate
that they all register the exact same address for a given thread.

The reference counting trick should also work for cases where applications
define a non-weak __getcpu_cache, and want to call the getcpu_cache
system call to register it themselves (before glibc adds support for it).

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2016-01-13  0:22 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-05  7:01 [RFC PATCH 0/3] Implement getcpu_cache system call Mathieu Desnoyers
2016-01-05  7:01 ` Mathieu Desnoyers
2016-01-05  7:01 ` [RFC PATCH 1/3] getcpu_cache system call: cache CPU number of running thread Mathieu Desnoyers
2016-01-05  7:01   ` Mathieu Desnoyers
2016-01-05 12:04   ` Will Deacon
2016-01-05 17:31     ` Mathieu Desnoyers
2016-01-05 17:34       ` Mathieu Desnoyers
2016-01-05 17:34         ` Mathieu Desnoyers
2016-01-05 17:40       ` Russell King - ARM Linux
2016-01-05 17:40         ` Russell King - ARM Linux
2016-01-05 17:49         ` Mathieu Desnoyers
2016-01-05 21:47         ` Paul E. McKenney
2016-01-05 21:47           ` Paul E. McKenney
2016-01-05 22:34           ` Mathieu Desnoyers
2016-01-05 22:34             ` Mathieu Desnoyers
2016-01-05 22:54             ` Paul E. McKenney
2016-01-05 22:54               ` Paul E. McKenney
2016-01-05  7:01 ` [RFC PATCH 2/3] getcpu_cache: wire up ARM system call Mathieu Desnoyers
2016-01-05  7:02 ` [RFC PATCH 3/3] getcpu_cache: wire up x86 32/64 " Mathieu Desnoyers
2016-01-05  7:02   ` Mathieu Desnoyers
2016-01-11 22:38 ` [RFC PATCH 0/3] Implement getcpu_cache " Seymour, Shane M
2016-01-11 22:38   ` Seymour, Shane M
2016-01-11 23:03   ` Josh Triplett
2016-01-12  0:49     ` Mathieu Desnoyers
2016-01-12  0:49       ` Mathieu Desnoyers
2016-01-12  2:45       ` Josh Triplett
2016-01-12  4:27         ` Ben Maurer
2016-01-12  4:27           ` Ben Maurer
2016-01-12  6:40           ` Seymour, Shane M
2016-01-12  6:40             ` Seymour, Shane M
2016-01-12 13:15           ` Mathieu Desnoyers
2016-01-12 13:15             ` Mathieu Desnoyers
2016-01-12 21:02             ` Ben Maurer
2016-01-12 21:02               ` Ben Maurer
2016-01-13  0:22               ` Mathieu Desnoyers [this message]
2016-01-13  0:22                 ` Mathieu Desnoyers
2016-01-13  0:51                 ` Josh Triplett
2016-01-13  0:51                   ` Josh Triplett
2016-01-14 15:58                   ` Mathieu Desnoyers
2016-01-14 15:58                     ` Mathieu Desnoyers
2016-01-11 23:16   ` Seymour, Shane M
2016-01-11 23:16     ` Seymour, Shane M

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=484967406.344576.1452644549992.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=ahh@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=bmaurer@fb.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=davejwatson@fb.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=shane.seymour@hpe.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.