All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Florian Weimer <fw@deneb.enyo.de>
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	paulmck <paulmck@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Paul Turner <pjt@google.com>,
	linux-api <linux-api@vger.kernel.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	David Laight <David.Laight@ACULAB.COM>,
	carlos <carlos@redhat.com>, Peter Oskolkov <posk@posk.io>
Subject: Re: [RFC PATCH 2/3] rseq: extend struct rseq with per thread group vcpu id
Date: Tue, 1 Feb 2022 15:22:10 -0500 (EST)	[thread overview]
Message-ID: <1075473571.25688.1643746930751.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <87bkzqz75q.fsf@mid.deneb.enyo.de>

----- On Feb 1, 2022, at 3:03 PM, Florian Weimer fw@deneb.enyo.de wrote:

> * Mathieu Desnoyers:
> 
>> If a thread group has fewer threads than cores, or is limited to run on
>> few cores concurrently through sched affinity or cgroup cpusets, the
>> virtual cpu ids will be values close to 0, thus allowing efficient use
>> of user-space memory for per-cpu data structures.
> 
> From a userspace programmer perspective, what's a good way to obtain a
> reasonable upper bound for the possible tg_vcpu_id values?

Some effective upper bounds:

- sysconf(3) _SC_NPROCESSORS_CONF,
- the number of threads which exist concurrently in the process,
- the number of cpus in the cpu affinity mask applied by sched_setaffinity,
  except in corner-case situations such as cpu hotplug removing all cpus from
  the affinity set,
- cgroup cpuset "partition" limits,

Note that AFAIR non-partition cgroup cpusets allow a cgroup to "borrow"
additional cores from the rest of the system if they are idle, therefore
allowing the number of concurrent threads to go beyond the specified limit.

> 
> I believe not all users of cgroup cpusets change the affinity mask.

AFAIR the sched affinity mask is tweaked independently of the cgroup cpuset.
Those are two mechanisms both affecting the scheduler task placement.

I would expect the user-space code to use some sensible upper bound as a
hint about how many per-vcpu data structure elements to expect (and how many
to pre-allocate), but have a "lazy initialization" fall-back in case the
vcpu id goes up to the number of configured processors - 1. And I suspect
that even the number of configured processors may change with CRIU.

> 
>> diff --git a/kernel/rseq.c b/kernel/rseq.c
>> index 13f6d0419f31..37b43735a400 100644
>> --- a/kernel/rseq.c
>> +++ b/kernel/rseq.c
>> @@ -86,10 +86,14 @@ static int rseq_update_cpu_node_id(struct task_struct *t)
>>  	struct rseq __user *rseq = t->rseq;
>>  	u32 cpu_id = raw_smp_processor_id();
>>  	u32 node_id = cpu_to_node(cpu_id);
>> +	u32 tg_vcpu_id = task_tg_vcpu_id(t);
>>  
>>  	if (!user_write_access_begin(rseq, t->rseq_len))
>>  		goto efault;
>>  	switch (t->rseq_len) {
>> +	case offsetofend(struct rseq, tg_vcpu_id):
>> +		unsafe_put_user(tg_vcpu_id, &rseq->tg_vcpu_id, efault_end);
>> +		fallthrough;
>>  	case offsetofend(struct rseq, node_id):
>>  		unsafe_put_user(node_id, &rseq->node_id, efault_end);
>>  		fallthrough;
> 
> Is the switch really useful?  I suspect it's faster to just write as
> much as possible all the time.  The switch should be well-predictable
> if running uniform userspace, but still …

The switch ensures the kernel don't try to write to a memory area beyond
the rseq size which has been registered by user-space. So it seems to be
useful to ensure we don't corrupt user-space memory. Or am I missing your
point ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2022-02-01 20:22 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-01 19:25 [RFC PATCH 1/3] Introduce per thread group current virtual cpu id Mathieu Desnoyers
2022-02-01 19:25 ` [RFC PATCH 2/3] rseq: extend struct rseq with per thread group vcpu id Mathieu Desnoyers
2022-02-01 20:03   ` Florian Weimer
2022-02-01 20:22     ` Mathieu Desnoyers [this message]
2022-02-01 20:32       ` Florian Weimer
2022-02-01 21:20         ` Mathieu Desnoyers
2022-02-01 21:30           ` Florian Weimer
2022-02-02  1:32             ` Mathieu Desnoyers
2022-02-03 15:53               ` Mathieu Desnoyers
2022-02-01 19:25 ` [RFC PATCH 3/3] selftests/rseq: Implement rseq tg_vcpu_id field support Mathieu Desnoyers
2022-02-01 19:49 ` [RFC PATCH 1/3] Introduce per thread group current virtual cpu id Peter Oskolkov
2022-02-01 21:00   ` Mathieu Desnoyers
2022-02-02 11:23 ` Peter Zijlstra
2022-02-02 13:48   ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1075473571.25688.1643746930751.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=boqun.feng@gmail.com \
    --cc=carlos@redhat.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=fw@deneb.enyo.de \
    --cc=hpa@zytor.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=posk@posk.io \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.