linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thierry Delisle <tdelisle@uwaterloo.ca>
To: <posk@google.com>
Cc: <avagin@google.com>, <bsegall@google.com>, <jannh@google.com>,
	<jnewsome@torproject.org>, <joel@joelfernandes.org>,
	<linux-api@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<mingo@redhat.com>, <mkarsten@uwaterloo.ca>,
	<pabuhr@uwaterloo.ca>, <peterz@infradead.org>, <pjt@google.com>,
	<posk@posk.io>, <tdelisle@uwaterloo.ca>, <tglx@linutronix.de>
Subject: Re: [RFC PATCH 3/3 v0.2] sched/umcg: RFC: implement UMCG syscalls
Date: Mon, 12 Jul 2021 17:44:18 -0400	[thread overview]
Message-ID: <acad5960-30b2-3693-9117-e0b054ee97a7@uwaterloo.ca> (raw)
In-Reply-To: <CAPNVh5f3H7Gor-Dph7=2jAdme-4mRfCCb0gv=wjgHQtd7Cad=Q@mail.gmail.com>

 > sys_umcg_wait without next_tid puts the task in UMCG_IDLE state; wake
 > wakes it. These are standard sched operations. If they are emulated
 > via futexes, fast context switching will require something like
 > FUTEX_SWAP that was NACKed last year.

I understand these wait and wake semantics and the need for the fast
context-switch(swap). As I see it, you need 3 operations:

- SWAP: context-switch directly to a different thread, no scheduler involved
- WAIT: block current thread, go back to server thread
- WAKE: unblock target thread, add it to scheduler, e.g. through
         idle_workers_ptr

There is no existing syscalls to handle SWAP, so I agree sys_umcg_wait is
needed for this to work.

However, there already exists sys_futex to handle WAIT and WAKE. When a 
worker
calls either sys_futex WAIT or sys_umcg_wait next_tid == NULL, in both case
the worker will block, SWAP to the server and wait for FUTEX_WAKE,
UMCG_WAIT_WAKE_ONLY respectively. It's not obvious to me that there 
would be
performance difference and the semantics seem to be the same to me.

So what I am asking is: is UMCG_WAIT_WAKE_ONLY needed?

Is the idea to support workers directly context-switching among each other,
without involving server threads and without going through idle_servers_ptr?

If so, can you explain some of the intended state transitions in this case.


 > > However, I do not understand how the userspace is expected to use 
it. I also
 > > do not understand if these link fields form a stack or a queue and 
where is
 > > the head.
 >
 > When a server has nothing to do (no work to run), it is put into IDLE
 > state and added to the list. The kernel wakes an IDLE server if a
 > blocked worker unblocks.

 From the code in umcg_wq_worker_running (Step 3), I am guessing users are
expected to provide a global head somewhere in memory and
umcg_task.idle_servers_ptr points to the head of the list for all workers.
Servers are then added in user space using atomic_stack_push_user. Is this
correct? I did not find any documentation on the list head.

I like the idea that each worker thread points to a given list, it 
allows the
possibility for separate containers with their own independent servers, 
workers
and scheduling. However, it seems that the list itself could be implemented
using existing kernel APIs, for example a futex or an event fd. Like so:

struct umcg_task {
      [...]

      /**
       * @idle_futex_ptr: pointer to a futex user for idle server threads.
       *
       * When waking a worker, the kernel decrements the pointed to 
futex value
       * if it is non-zero and wakes a server if the decrement occurred.
       *
       * Server threads that have no work to do should increment the futex
       * value and FUTEX_WAIT
       */
      uint64_t    idle_futex_ptr;    /* r/w */

      [...]
} __attribute__((packed, aligned(8 * sizeof(__u64))));

I believe the futex approach, like the list, has the advantage that when 
there
are no idle servers, checking the list requires no locking. I don't know if
that can be achieved with eventfd.


  reply	other threads:[~2021-07-12 21:44 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-08 19:46 [RFC PATCH 0/3 v0.2] RFC: sched/UMCG Peter Oskolkov
2021-07-08 19:46 ` [RFC PATCH 1/3 v0.2] sched: add WF_CURRENT_CPU and externise ttwu Peter Oskolkov
2021-07-08 19:46 ` [RFC PATCH 2/3 v0.2] sched/umcg: RFC: add userspace atomic helpers Peter Oskolkov
2021-07-08 21:12   ` Jann Horn
2021-07-09  4:01     ` Peter Oskolkov
2021-07-09  8:01   ` Peter Zijlstra
2021-07-09 16:57     ` Peter Oskolkov
2021-07-09 17:33       ` Peter Oskolkov
2021-07-13 16:10       ` Peter Zijlstra
2021-07-13 17:14         ` Peter Oskolkov
2021-07-08 19:46 ` [RFC PATCH 3/3 v0.2] sched/umcg: RFC: implement UMCG syscalls Peter Oskolkov
2021-07-11 16:35   ` Peter Oskolkov
2021-07-11 18:29   ` Thierry Delisle
2021-07-12 15:40     ` Peter Oskolkov
2021-07-12 21:44       ` Thierry Delisle [this message]
2021-07-12 23:31         ` Peter Oskolkov
2021-07-13 14:02           ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acad5960-30b2-3693-9117-e0b054ee97a7@uwaterloo.ca \
    --to=tdelisle@uwaterloo.ca \
    --cc=avagin@google.com \
    --cc=bsegall@google.com \
    --cc=jannh@google.com \
    --cc=jnewsome@torproject.org \
    --cc=joel@joelfernandes.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mkarsten@uwaterloo.ca \
    --cc=pabuhr@uwaterloo.ca \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=posk@google.com \
    --cc=posk@posk.io \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).