From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753503AbcAMBrG (ORCPT ); Tue, 12 Jan 2016 20:47:06 -0500 Received: from relay3-d.mail.gandi.net ([217.70.183.195]:40197 "EHLO relay3-d.mail.gandi.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751224AbcAMBrE (ORCPT ); Tue, 12 Jan 2016 20:47:04 -0500 X-Originating-IP: 50.39.163.18 User-Agent: K-9 Mail for Android In-Reply-To: <484967406.344576.1452644549992.JavaMail.zimbra@efficios.com> References: <1451977320-4886-1-git-send-email-mathieu.desnoyers@efficios.com> <20160111230306.GC28717@cloud> <137700396.343696.1452559758752.JavaMail.zimbra@efficios.com> <20160112024549.GA6488@x> <9F8D25C2-B5EE-479D-BD61-0FE466962B9E@fb.com> <467525713.343916.1452604549209.JavaMail.zimbra@efficios.com> <5CDDBDF2D36D9F43B9F5E99003F6A0D49A1426A5@PRN-MBX02-1.TheFacebook.com> <484967406.344576.1452644549992.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Subject: Re: [RFC PATCH 0/3] Implement getcpu_cache system call From: Josh Triplett Date: Tue, 12 Jan 2016 16:51:53 -0800 To: Mathieu Desnoyers , Ben Maurer CC: Shane M Seymour , Thomas Gleixner , Paul Turner , Andrew Hunter , Peter Zijlstra , linux-kernel@vger.kernel.org, linux-api , Andy Lutomirski , Andi Kleen , Dave Watson , Chris Lameter , Ingo Molnar , rostedt , "Paul E. McKenney" , Linus Torvalds , Andrew Morton , Russell King , Catalin Marinas , Will Deacon , Michael Kerrisk Message-ID: <33D8C4C1-472E-4C3E-B722-B890FD592E23@joshtriplett.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On January 12, 2016 4:22:29 PM PST, Mathieu Desnoyers wrote: >----- On Jan 12, 2016, at 4:02 PM, Ben Maurer bmaurer@fb.com wrote: > >>> One idea I have would be to let the kernel reserve some space either >after the >>> first stack address (for a stack growing down) or at the beginning >of the >>> allocated TLS area for each thread in copy_thread_tls() by fiddling >with >>> sp or the tls base address when creating a thread. >> >> Could this be implemented by having glibc use a well known symbol >name to define >> the per-thread TLS area? If an high performance application wants to >avoid any >> relocations in accessing this variable it would define it and that >definition >> would override glibc's. This is how things work with malloc. glibc >has a >> default malloc implementation but we link jemalloc directly into our >binaries. >> in addition to changing the malloc implementation this means that >calls to >> malloc don't go through the PLT. > >Just to make sure I understand your proposal: defining a well known >symbol >with a weak attribute in glibc (or bionic...), e.g.: > >int32_t __thread __attribute__((weak)) __getcpu_cache; > >so that applications which care about bypassing the PLT can override it >with: > >int32_t __thread __getcpu_cache; > >glibc/bionic would be responsible for calling the getcpu_cache() system >call >to register/unregister this TLS variable for each thread. > >One thing I would like to figure out is whether we can use this in a >way that >would allow introducing getcpu_cache() into applications and libraries >(e.g. lttng-ust tracer) before it gets implemented into glibc, in a way >that >would keep forward compatibility for whenever it gets introduced in >glibc. > >We can declare __getcpu_cache as a weak symbol in arbitrary libraries, >and >make them register/unregister the cache through the getcpu_cache >syscall. >The main thing that I would need to tweak at the kernel level within >the >system call would be to keep a refcount of the number of times the >__getcpu_cache is registered per thread. This would allow multiple >registrations, >one per library (e.g. lttng-ust) and one for glibc, but we would >validate >that they all register the exact same address for a given thread. > >The reference counting trick should also work for cases where >applications >define a non-weak __getcpu_cache, and want to call the getcpu_cache >system call to register it themselves (before glibc adds support for >it). This seems like something better done in a tiny common library, rather than the kernel or by playing symbol resolution games. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Josh Triplett Subject: Re: [RFC PATCH 0/3] Implement getcpu_cache system call Date: Tue, 12 Jan 2016 16:51:53 -0800 Message-ID: <33D8C4C1-472E-4C3E-B722-B890FD592E23@joshtriplett.org> References: <1451977320-4886-1-git-send-email-mathieu.desnoyers@efficios.com> <20160111230306.GC28717@cloud> <137700396.343696.1452559758752.JavaMail.zimbra@efficios.com> <20160112024549.GA6488@x> <9F8D25C2-B5EE-479D-BD61-0FE466962B9E@fb.com> <467525713.343916.1452604549209.JavaMail.zimbra@efficios.com> <5CDDBDF2D36D9F43B9F5E99003F6A0D49A1426A5@PRN-MBX02-1.TheFacebook.com> <484967406.344576.1452644549992.JavaMail.zimbra@efficios.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Return-path: In-Reply-To: <484967406.344576.1452644549992.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Mathieu Desnoyers , Ben Maurer Cc: Shane M Seymour , Thomas Gleixner , Paul Turner , Andrew Hunter , Peter Zijlstra , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-api , Andy Lutomirski , Andi Kleen , Dave Watson , Chris Lameter , Ingo Molnar , rostedt , "Paul E. McKenney" , Linus Torvalds , Andrew Morton , Russell King , Catalin Marinas , Will Deacon , Michael Kerrisk List-Id: linux-api@vger.kernel.org On January 12, 2016 4:22:29 PM PST, Mathieu Desnoyers wrote: >----- On Jan 12, 2016, at 4:02 PM, Ben Maurer bmaurer-b10kYP2dOMg@public.gmane.org wrote: > >>> One idea I have would be to let the kernel reserve some space either >after the >>> first stack address (for a stack growing down) or at the beginning >of the >>> allocated TLS area for each thread in copy_thread_tls() by fiddling >with >>> sp or the tls base address when creating a thread. >> >> Could this be implemented by having glibc use a well known symbol >name to define >> the per-thread TLS area? If an high performance application wants to >avoid any >> relocations in accessing this variable it would define it and that >definition >> would override glibc's. This is how things work with malloc. glibc >has a >> default malloc implementation but we link jemalloc directly into our >binaries. >> in addition to changing the malloc implementation this means that >calls to >> malloc don't go through the PLT. > >Just to make sure I understand your proposal: defining a well known >symbol >with a weak attribute in glibc (or bionic...), e.g.: > >int32_t __thread __attribute__((weak)) __getcpu_cache; > >so that applications which care about bypassing the PLT can override it >with: > >int32_t __thread __getcpu_cache; > >glibc/bionic would be responsible for calling the getcpu_cache() system >call >to register/unregister this TLS variable for each thread. > >One thing I would like to figure out is whether we can use this in a >way that >would allow introducing getcpu_cache() into applications and libraries >(e.g. lttng-ust tracer) before it gets implemented into glibc, in a way >that >would keep forward compatibility for whenever it gets introduced in >glibc. > >We can declare __getcpu_cache as a weak symbol in arbitrary libraries, >and >make them register/unregister the cache through the getcpu_cache >syscall. >The main thing that I would need to tweak at the kernel level within >the >system call would be to keep a refcount of the number of times the >__getcpu_cache is registered per thread. This would allow multiple >registrations, >one per library (e.g. lttng-ust) and one for glibc, but we would >validate >that they all register the exact same address for a given thread. > >The reference counting trick should also work for cases where >applications >define a non-weak __getcpu_cache, and want to call the getcpu_cache >system call to register it themselves (before glibc adds support for >it). This seems like something better done in a tiny common library, rather than the kernel or by playing symbol resolution games.