From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD35AC4332F for ; Mon, 17 Oct 2022 16:05:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229739AbiJQQFq (ORCPT ); Mon, 17 Oct 2022 12:05:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229722AbiJQQFp (ORCPT ); Mon, 17 Oct 2022 12:05:45 -0400 Received: from smtpout.efficios.com (smtpout.efficios.com [IPv6:2607:5300:203:5aae::31e5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41A9C6A492; Mon, 17 Oct 2022 09:05:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1666022717; bh=Yiw7gSqlI3hHqY8j9RQbEVxPlTn0uq4K7BeOAcs30rY=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=N0zoDqFJ9uILwRgG3gkcQzo7FsigxD9IHxBwKN9F+ad7w5NsFHahgrohSe7X3kEDv gwF/5p0FyXWkvdlDLeHAWqHsXT80zg2vzjMsOS1mzpcWY9N/dKkMx8ypp7iCps/DMB qdbB/zxw/x7ofCAtX1mtZmXa7kbOuJGQzwreQPV/tdJwXYuAsvb0QVzVRkLPXZbrIV oR5ng3bccSIXO4z9qLVA4mv96tBnBXiWDpVjbutcPPayo6GPKZV+95rN8HjdNsY706 8vLhjfxqoRYq8GnxERN6S1/lxlTyLt8H5m9BCZ9J2I0yN8vps3vri1a9zhn1q2lIC6 tEArj3xQdRjZg== Received: from [172.16.0.72] (192-222-180-24.qc.cable.ebox.net [192.222.180.24]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4MrhfN5HnszV0j; Mon, 17 Oct 2022 12:05:16 -0400 (EDT) Message-ID: <55ade976-efac-3a89-f5e4-9008b7030388@efficios.com> Date: Mon, 17 Oct 2022 12:05:34 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.0 Subject: Re: [PATCH v4 00/25] RSEQ node id and virtual cpu id extensions Content-Language: en-US To: Florian Weimer Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, Thomas Gleixner , "Paul E . McKenney" , Boqun Feng , "H . Peter Anvin" , Paul Turner , linux-api@vger.kernel.org, Christian Brauner , David.Laight@ACULAB.COM, carlos@redhat.com, Peter Oskolkov , Alexander Mikhalitsyn References: <20220922105941.237830-1-mathieu.desnoyers@efficios.com> <8735bv25k2.fsf@oldenburg.str.redhat.com> From: Mathieu Desnoyers In-Reply-To: <8735bv25k2.fsf@oldenburg.str.redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-api@vger.kernel.org On 2022-10-10 09:04, Florian Weimer wrote: > * Mathieu Desnoyers: > >> Extend the rseq ABI to expose a NUMA node ID and a vm_vcpu_id field. >> >> The NUMA node ID field allows implementing a faster getcpu(2) in libc. >> >> The virtual cpu id allows ideal scaling (down or up) of user-space >> per-cpu data structures. The virtual cpu ids allocated within a memory >> space are tracked by the scheduler, which takes into account the number >> of concurrently running threads, thus implicitly considering the number >> of threads, the cpu affinity, the cpusets applying to those threads, and >> the number of logical cores on the system. > > Do you have some code that shows how the userspace application handshake > is supposed to work with the existing three __rseq_* symbols? Maybe I'm > missing something. see https://lore.kernel.org/lkml/20220922105941.237830-5-mathieu.desnoyers@efficios.com/ +static +unsigned int get_rseq_feature_size(void) +{ + unsigned long auxv_rseq_feature_size, auxv_rseq_align; + + auxv_rseq_align = getauxval(AT_RSEQ_ALIGN); + assert(!auxv_rseq_align || auxv_rseq_align <= RSEQ_THREAD_AREA_ALLOC_SIZE); + + auxv_rseq_feature_size = getauxval(AT_RSEQ_FEATURE_SIZE); + assert(!auxv_rseq_feature_size || auxv_rseq_feature_size <= RSEQ_THREAD_AREA_ALLOC_SIZE); + if (auxv_rseq_feature_size) + return auxv_rseq_feature_size; + else + return ORIG_RSEQ_FEATURE_SIZE; +} then in rseq_init(): + rseq_feature_size = get_rseq_feature_size(); + if (rseq_feature_size == ORIG_RSEQ_FEATURE_SIZE) + rseq_size = ORIG_RSEQ_ALLOC_SIZE; + else + rseq_size = RSEQ_THREAD_AREA_ALLOC_SIZE; Then using it for e.g. node_id: https://lore.kernel.org/lkml/20220922105941.237830-6-mathieu.desnoyers@efficios.com/ +#ifndef rseq_sizeof_field +#define rseq_sizeof_field(TYPE, MEMBER) sizeof((((TYPE *)0)->MEMBER)) +#endif + +#ifndef rseq_offsetofend +#define rseq_offsetofend(TYPE, MEMBER) \ + (offsetof(TYPE, MEMBER) + rseq_sizeof_field(TYPE, MEMBER)) +#endif +static inline bool rseq_node_id_available(void) +{ + return (int) rseq_feature_size >= rseq_offsetofend(struct rseq_abi, node_id); +} + +/* + * Current NUMA node number. + */ +static inline uint32_t rseq_current_node_id(void) +{ + assert(rseq_node_id_available()); + return RSEQ_ACCESS_ONCE(rseq_get_abi()->node_id); +} > > From an application perspective, it would be best to add 8 more shared > bytes in use, to push the new feature size over 32. This would be > clearly visible in __rseq_size, helping applications a lot. [ I guess you meant 12 bytes ] The fool-proof approach here would be to skip the 12 bytes of padding currently at the end of struct rseq. Maybe this is something we should do in order to make sure the userspace check is regular for all fields. > > Alternatively, we could sacrifice a bit to indicate that the this round > of extensions is present. But we'll need another bit to indicate that > the last remaining 4 bytes are in use, for consistency. Or come up with > something to put their today. The TID seems like an obvious choice. Whatever we add into those bits would need to be "special" and use something like a flag check to validate whether the field is populated or not. Perhaps keeping things simpler and skipping those 12 bytes entirely is preferable. > > If we want to the 8 more bytes route, TID and PID should be > uncontroversal? The PID cache is clearly something that userspace > likes, not just as a defeat device for the old BYTE benchmark. I agree that having the PID and TID there might be relevant, but I would rather prefer to have all fields use a check that is regular from the point of view of userspace. This minimizes the risk of user errors. Thoughts ? Thanks, Mathieu > > Thanks, > Florian > -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com